sci-rus-tiny
Property | Value |
---|---|
License | MIT |
Languages | Russian, English |
Output Dimensions | 312 |
Developer | MLSA Lab of Institute for AI, MSU |
What is sci-rus-tiny?
sci-rus-tiny is a specialized embedding model designed for processing scientific texts in both Russian and English. Developed by the MLSA Lab at MSU's Institute for AI, this model was trained on eLibrary data using advanced contrastive learning techniques. The model has demonstrated impressive performance on the ruSciBench benchmark, making it particularly valuable for scientific text analysis.
Implementation Details
The model implements a transformer-based architecture and produces 312-dimensional embeddings. It can be easily integrated using either the transformers library or sentence-transformers framework, offering flexibility in implementation approaches. The model processes both titles and abstracts, combining them with a special separator token '' for comprehensive document representation.
- Supports both PyTorch and Hugging Face Transformers implementations
- Includes built-in text normalization and pooling mechanisms
- Optimized for scientific content processing
Core Capabilities
- Bilingual text embedding generation (Russian/English)
- Scientific document similarity analysis
- Feature extraction for downstream tasks
- Masked language modeling support
Frequently Asked Questions
Q: What makes this model unique?
The model's specialization in scientific text processing and its bilingual capabilities (Russian/English) make it particularly valuable for academic and research applications. Its lightweight architecture ('tiny' version) ensures efficient processing while maintaining high performance on scientific text tasks.
Q: What are the recommended use cases?
The model is ideal for scientific document similarity matching, academic paper classification, research content analysis, and building scientific search engines. It's particularly useful when working with mixed Russian-English scientific corpora.