DeBERTa-v1-base
Property | Value |
---|---|
Parameter Count | 124M |
License | Apache 2.0 |
Languages | Russian, English |
Training Data | 400GB text corpus |
Author | deepvk |
What is deberta-v1-base?
DeBERTa-v1-base is a powerful pretrained bidirectional encoder specifically designed for Russian language processing. Developed by deepvk, this model represents a significant advancement in Russian language understanding, trained on a massive 400GB dataset including diverse sources like Wikipedia, books, social media, and news content.
Implementation Details
The model features a sophisticated architecture with 12 encoder layers, 12 attention heads, and an embedding dimension of 768. It utilizes the GeLU activation function and implements byte-level BPE tokenization with a vocabulary size of 50,266. Training was conducted using mixed FP16 precision on 8xA100 GPUs over approximately 30 days.
- 12 encoder layers with 12 attention heads
- 768 dimensional embeddings with 3,072 FFN dimension
- Trained with AdamW optimizer and linear learning rate scheduler
- Implements sophisticated deduplication using MinHash algorithm
Core Capabilities
- Strong performance on Russian SuperGLUE benchmark tasks
- Effective feature extraction for Russian language understanding
- Maximum sequence length of 512 tokens
- Robust handling of both Russian and English text
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its extensive pretraining on a carefully curated and deduplicated 400GB Russian language corpus, achieving state-of-the-art results on multiple Russian SuperGLUE tasks, particularly excelling in PARus and MuSeRC benchmarks.
Q: What are the recommended use cases?
This model is particularly well-suited for feature extraction tasks in Russian language processing, including text classification, semantic analysis, and general language understanding tasks. It's designed as an encoder-only model without any pretrained head, making it versatile for various downstream tasks.