BERTA

Property	Value
Model Type	Sentence Transformer
Embedding Dimension	768
Number of Layers	12
Context Length	512 tokens
Author	sergeyzh
Model URL	HuggingFace

What is BERTA?

BERTA is a distilled sentence embedding model designed for processing Russian and English text, created through knowledge distillation from the larger FRIDA model. It maintains high performance while reducing the model size from 1536 to 768 dimensions and from 24 to 12 layers, making it more efficient for production deployment.

Implementation Details

The model utilizes mean pooling instead of CLS pooling and preserves FRIDA's prefix system for specialized tasks. It supports a context window of 512 tokens and achieves comparable performance to its teacher model across various NLP tasks.

Distilled architecture with 768-dimensional embeddings
Optimized mean pooling strategy
Comprehensive prefix support for task specialization
Bilingual capability (Russian and English)

Core Capabilities

Semantic Text Similarity (STS) with 0.822 Pearson correlation
Paraphrase Identification (PI)
Natural Language Inference (NLI)
Sentiment Analysis (SA)
Toxicity Identification (TI) with 0.986 accuracy
Document Classification and Clustering

Frequently Asked Questions

Q: What makes this model unique?

BERTA stands out for its efficient architecture that maintains near-FRIDA performance levels while using only half the parameter count. It's specifically optimized for Russian and English language processing with specialized prefix support for different NLP tasks.

Q: What are the recommended use cases?

The model excels in semantic similarity tasks, document classification, and information retrieval. It's particularly effective for applications requiring bilingual (Russian/English) text understanding, with strong performance in sentiment analysis and toxicity detection.

BERTA

BERTA

What is BERTA?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models