USER-base

USER-base

deepvk

Russian-specific sentence encoder that maps text to 768D vectors. Built on DeBERTa-v1-base with 85M params. Optimized for semantic search & clustering.

PropertyValue
Model Size85M parameters
Embedding Dimension768
Base ArchitectureDeBERTa-v1-base
Hugging Facedeepvk/USER-base

What is USER-base?

USER-base (Universal Sentence Encoder for Russian) is a specialized sentence transformer model designed exclusively for the Russian language. It transforms Russian text into 768-dimensional dense vector representations, making it ideal for semantic search, clustering, and other NLP tasks. The model builds upon deepvk/deberta-v1-base and has been extensively trained on Russian language data.

Implementation Details

The model follows a sophisticated training approach inspired by bge-base-en but with Russian-specific optimizations. The training process involved two key stages: contrastive pre-training using weak supervision on the Russian mMarco corpus, followed by supervised fine-tuning using both symmetric and asymmetric data approaches. The model notably employs the innovative LM-Cocktail technique for merging different training objectives.

  • Training on over 3.3M positive pairs and 792K negative pairs
  • Implements both query and passage embeddings with specific prefixes
  • Outperforms other base-sized models on Encodechka and MTEB benchmarks

Core Capabilities

  • High-quality Russian text embeddings for semantic similarity tasks
  • Efficient information retrieval and passage matching
  • Clustering and semantic search optimization
  • Competitive performance on Russian NLP benchmarks

Frequently Asked Questions

Q: What makes this model unique?

USER-base is specifically optimized for Russian language processing, offering state-of-the-art performance while maintaining a relatively compact size of 85M parameters. It achieves impressive results on both Encodechka (0.772) and MTEB (0.666) benchmarks, outperforming other models of similar size.

Q: What are the recommended use cases?

The model excels in various scenarios: use "query:" and "passage:" prefixes for asymmetric tasks like passage retrieval and QA, use "query:" prefix for symmetric tasks like semantic similarity and paraphrase detection, and for embedding-based features in classification or clustering tasks.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026