bcms-bertic

Maintained By
classla

BERTić (BERT-ich)

PropertyValue
LicenseApache 2.0
LanguagesBosnian, Croatian, Montenegrin, Serbian
ArchitectureELECTRA-based Transformer

What is bcms-bertic?

BERTić is a state-of-the-art transformer language model specifically designed for Bosnian, Croatian, Montenegrin and Serbian languages. Trained on an impressive dataset of over 8 billion tokens, it represents a significant advancement in natural language processing for these Balto-Slavic languages. The model's name cleverly incorporates the "-ić" suffix, common in Croatian diminutives and surnames across these regions.

Implementation Details

Built on the ELECTRA architecture, BERTić demonstrates superior performance compared to multilingual BERT and CroSloEngual BERT across multiple NLP tasks. The model has been extensively evaluated on various benchmarks, showing particularly strong results in part-of-speech tagging, named entity recognition, geolocation prediction, and commonsense causal reasoning.

  • Achieves up to 95.81% accuracy in Croatian POS tagging
  • Reaches 89.21% F1-score in Croatian NER tasks
  • Demonstrates superior geolocation prediction with 37.96 median distance error
  • Shows 65.76% accuracy in the COPA dataset for causal reasoning

Core Capabilities

  • Part-of-speech tagging for standard and non-standard language varieties
  • Named entity recognition across multiple language variants
  • Geolocation prediction from social media text
  • Commonsense causal reasoning
  • Support for both formal and internet-based language varieties

Frequently Asked Questions

Q: What makes this model unique?

BERTić is the first transformer model specifically optimized for Bosnian, Croatian, Montenegrin and Serbian languages, consistently outperforming multilingual alternatives across various NLP tasks. Its training on 8B+ tokens makes it particularly robust for these languages.

Q: What are the recommended use cases?

The model is ideal for tasks involving standard and non-standard language processing in BCMS languages, including POS tagging, NER, text classification, and semantic analysis. It's particularly effective for applications requiring understanding of both formal and informal language varieties.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.