deberta-v1-base

deberta-v1-base

deepvk

DeBERTa-v1-base is a 124M parameter Russian language model trained on 400GB of text, featuring 12 encoder layers and achieving strong performance on Russian SuperGLUE benchmarks.

PropertyValue
Parameter Count124M
LicenseApache 2.0
LanguagesRussian, English
Training Data400GB text corpus
Authordeepvk

What is deberta-v1-base?

DeBERTa-v1-base is a powerful pretrained bidirectional encoder specifically designed for Russian language processing. Developed by deepvk, this model represents a significant advancement in Russian language understanding, trained on a massive 400GB dataset including diverse sources like Wikipedia, books, social media, and news content.

Implementation Details

The model features a sophisticated architecture with 12 encoder layers, 12 attention heads, and an embedding dimension of 768. It utilizes the GeLU activation function and implements byte-level BPE tokenization with a vocabulary size of 50,266. Training was conducted using mixed FP16 precision on 8xA100 GPUs over approximately 30 days.

  • 12 encoder layers with 12 attention heads
  • 768 dimensional embeddings with 3,072 FFN dimension
  • Trained with AdamW optimizer and linear learning rate scheduler
  • Implements sophisticated deduplication using MinHash algorithm

Core Capabilities

  • Strong performance on Russian SuperGLUE benchmark tasks
  • Effective feature extraction for Russian language understanding
  • Maximum sequence length of 512 tokens
  • Robust handling of both Russian and English text

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its extensive pretraining on a carefully curated and deduplicated 400GB Russian language corpus, achieving state-of-the-art results on multiple Russian SuperGLUE tasks, particularly excelling in PARus and MuSeRC benchmarks.

Q: What are the recommended use cases?

This model is particularly well-suited for feature extraction tasks in Russian language processing, including text classification, semantic analysis, and general language understanding tasks. It's designed as an encoder-only model without any pretrained head, making it versatile for various downstream tasks.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026