bert-base-bg-cs-pl-ru-cased

Maintained By
DeepPavlov

bert-base-bg-cs-pl-ru-cased

PropertyValue
Parameters180M
Architecture12-layer, 768-hidden, 12-heads
PaperACL Anthology W19-3712
AuthorDeepPavlov

What is bert-base-bg-cs-pl-ru-cased?

SlavicBERT is a specialized BERT model designed specifically for Slavic languages, focusing on Bulgarian, Czech, Polish, and Russian. It was initialized from multilingual BERT and further trained on Russian News and Wikipedia data from these four languages. The model maintains case sensitivity and uses a custom subtoken vocabulary built from its training data.

Implementation Details

The model follows the BERT-base architecture with significant customizations for Slavic languages. It features 12 transformer layers, 768 hidden dimensions, and 12 attention heads, totaling 180M parameters. As of November 2021, it includes both Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) heads.

  • Specialized vocabulary for Slavic languages
  • Trained on multiple high-quality sources including news and Wikipedia
  • Case-sensitive processing
  • Built upon multilingual BERT architecture

Core Capabilities

  • Native support for Bulgarian, Czech, Polish, and Russian
  • Optimized for Named Entity Recognition tasks
  • Efficient cross-lingual transfer learning
  • Advanced contextual embeddings for Slavic languages

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Slavic languages, offering better performance than general multilingual models for Bulgarian, Czech, Polish, and Russian language tasks. Its specialized training on news and Wikipedia content makes it particularly effective for real-world applications.

Q: What are the recommended use cases?

The model is particularly well-suited for Named Entity Recognition tasks in Slavic languages, as well as general natural language understanding tasks like text classification, sentiment analysis, and sequence labeling in these languages. It's especially valuable for applications requiring cross-lingual transfer between Slavic languages.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.