BERTA
Property | Value |
---|---|
Model Type | Sentence Transformer |
Embedding Dimension | 768 |
Number of Layers | 12 |
Context Length | 512 tokens |
Author | sergeyzh |
Model URL | HuggingFace |
What is BERTA?
BERTA is a distilled sentence embedding model designed for processing Russian and English text, created through knowledge distillation from the larger FRIDA model. It maintains high performance while reducing the model size from 1536 to 768 dimensions and from 24 to 12 layers, making it more efficient for production deployment.
Implementation Details
The model utilizes mean pooling instead of CLS pooling and preserves FRIDA's prefix system for specialized tasks. It supports a context window of 512 tokens and achieves comparable performance to its teacher model across various NLP tasks.
- Distilled architecture with 768-dimensional embeddings
- Optimized mean pooling strategy
- Comprehensive prefix support for task specialization
- Bilingual capability (Russian and English)
Core Capabilities
- Semantic Text Similarity (STS) with 0.822 Pearson correlation
- Paraphrase Identification (PI)
- Natural Language Inference (NLI)
- Sentiment Analysis (SA)
- Toxicity Identification (TI) with 0.986 accuracy
- Document Classification and Clustering
Frequently Asked Questions
Q: What makes this model unique?
BERTA stands out for its efficient architecture that maintains near-FRIDA performance levels while using only half the parameter count. It's specifically optimized for Russian and English language processing with specialized prefix support for different NLP tasks.
Q: What are the recommended use cases?
The model excels in semantic similarity tasks, document classification, and information retrieval. It's particularly effective for applications requiring bilingual (Russian/English) text understanding, with strong performance in sentiment analysis and toxicity detection.