LaBSE-en-ru

Property	Value
Parameter Count	129M
Author	cointegrated
Paper	Language-agnostic BERT Sentence Embedding
Model Type	Bilingual BERT
Languages	English, Russian

What is LaBSE-en-ru?

LaBSE-en-ru is a specialized bilingual version of Google's Language-agnostic BERT Sentence Embedding (LaBSE) model, specifically optimized for English and Russian languages. This model represents a significant optimization, reducing the original model size to just 27% while maintaining the quality of embeddings for these two languages.

Implementation Details

The model utilizes the BERT architecture and has been carefully truncated to retain only English and Russian tokens in its vocabulary, resulting in a 90% reduction in vocabulary size. With 129M parameters, it offers efficient sentence embedding generation using PyTorch and the Transformers library.

Optimized vocabulary focused on English and Russian tokens
Supports sentence similarity tasks
Implements efficient embedding generation
Uses normalized pooler output for representations

Core Capabilities

Bilingual sentence embedding generation
Cross-lingual sentence similarity comparison
Efficient processing with reduced model size
Maximum sequence length of 64 tokens

Frequently Asked Questions

Q: What makes this model unique?

This model's uniqueness lies in its efficient bilingual optimization, offering the same quality as the original LaBSE but with significantly reduced model size and focused vocabulary for English and Russian languages.

Q: What are the recommended use cases?

The model is ideal for cross-lingual sentence similarity tasks between English and Russian, document alignment, and bilingual text processing applications where efficient computation is required.

LaBSE-en-ru

LaBSE-en-ru

What is LaBSE-en-ru?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models