DeBERTa-v1-base

Property	Value
Parameter Count	124M
License	Apache 2.0
Languages	Russian, English
Training Data	400GB text corpus
Author	deepvk

What is deberta-v1-base?

DeBERTa-v1-base is a powerful pretrained bidirectional encoder specifically designed for Russian language processing. Developed by deepvk, this model represents a significant advancement in Russian language understanding, trained on a massive 400GB dataset including diverse sources like Wikipedia, books, social media, and news content.

Implementation Details

The model features a sophisticated architecture with 12 encoder layers, 12 attention heads, and an embedding dimension of 768. It utilizes the GeLU activation function and implements byte-level BPE tokenization with a vocabulary size of 50,266. Training was conducted using mixed FP16 precision on 8xA100 GPUs over approximately 30 days.

12 encoder layers with 12 attention heads
768 dimensional embeddings with 3,072 FFN dimension
Trained with AdamW optimizer and linear learning rate scheduler
Implements sophisticated deduplication using MinHash algorithm

Core Capabilities

Strong performance on Russian SuperGLUE benchmark tasks
Effective feature extraction for Russian language understanding
Maximum sequence length of 512 tokens
Robust handling of both Russian and English text

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its extensive pretraining on a carefully curated and deduplicated 400GB Russian language corpus, achieving state-of-the-art results on multiple Russian SuperGLUE tasks, particularly excelling in PARus and MuSeRC benchmarks.

Q: What are the recommended use cases?

This model is particularly well-suited for feature extraction tasks in Russian language processing, including text classification, semantic analysis, and general language understanding tasks. It's designed as an encoder-only model without any pretrained head, making it versatile for various downstream tasks.

deberta-v1-base