rubert-tiny

rubert-tiny

cointegrated

Compact 12M parameter BERT model for Russian/English tasks, optimized for speed and size. Supports masked language modeling and sentence embeddings.

PropertyValue
Parameter Count11.9M parameters
Model Size45 MB
LicenseMIT
LanguagesRussian, English
Authorcointegrated

What is rubert-tiny?

rubert-tiny is a highly compressed, distilled version of the bert-base-multilingual-cased model specifically optimized for Russian and English language tasks. This lightweight model represents a significant achievement in model efficiency, being approximately 10 times smaller and faster than standard base-sized BERT models while maintaining practical utility for various NLP tasks.

Implementation Details

The model was trained using a sophisticated combination of techniques including MLM loss (distilled from bert-base-multilingual-cased), translation ranking loss, and CLS embeddings distilled from multiple sources including LaBSE, rubert-base-cased-sentence, Laser, and USE. Training data incorporated the Yandex Translate corpus, OPUS-100, and Tatoeba datasets.

  • Efficient architecture with only 11.9M parameters
  • Supports both feature extraction and masked language modeling
  • Optimized for cross-lingual sentence embeddings
  • Compatible with PyTorch and Transformers library

Core Capabilities

  • Fill-mask prediction for Russian and English text
  • Sentence similarity computation
  • Feature extraction for downstream tasks
  • Cross-lingual embeddings generation
  • Efficient fine-tuning for specific NLP tasks

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its exceptional efficiency-to-performance ratio, being 10x smaller than traditional BERT models while maintaining practical utility for Russian and English NLP tasks. Its unique training approach, combining multiple distillation sources and objectives, makes it particularly valuable for resource-constrained applications.

Q: What are the recommended use cases?

The model is ideal for scenarios requiring quick inference or limited computational resources, particularly suited for: NER tasks, sentiment classification, cross-lingual sentence embedding generation, and other basic NLP tasks where speed and size are prioritized over maximum accuracy.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026