wav2vec2-mbart50-ru

wav2vec2-mbart50-ru

bond005

A Russian speech-to-text model combining Wav2Vec2 and mBART-50 architectures, achieving WER of 13-32% across multiple datasets. Handles punctuation and capitalization.

PropertyValue
LicenseApache 2.0
ArchitectureSpeech Encoder-Decoder
Primary TaskRussian Speech Recognition
AuthorIvan Bondarenko

What is wav2vec2-mbart50-ru?

wav2vec2-mbart50-ru is an advanced speech-to-text model specifically designed for Russian language processing. It combines Wav2Vec2-Large-Ru-Golos as the encoder and mBART-large-50 as the decoder, creating a powerful speech recognition system that can handle not just basic transcription but also proper punctuation and capitalization.

Implementation Details

The model was trained on multiple Russian speech datasets, including SberDevices Golos, Common Voice 6.0, Sova RuDevices, and Russian Librispeech. It requires 16kHz audio input and demonstrates impressive Word Error Rates (WER) ranging from 13.2% to 32.5% across different test sets.

  • Utilizes SpeechEncoderDecoderModel architecture
  • Trained on 5 different Russian speech datasets
  • Supports automatic punctuation and capitalization
  • Processes 16kHz audio input

Core Capabilities

  • Accurate Russian speech recognition with WER as low as 13.2%
  • Automatic text enhancement with proper punctuation
  • Handles various speech conditions (crowd, farfield)
  • Production-ready with PyTorch implementation

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its combination of Wav2Vec2 and mBART-50 architectures, allowing it to not only transcribe speech but also add proper punctuation and capitalization automatically. It's been trained on diverse Russian speech datasets, making it robust across different speaking conditions.

Q: What are the recommended use cases?

The model is ideal for Russian speech transcription tasks requiring high accuracy and proper formatting. It's particularly effective for applications in crowd-sourced audio, farfield recordings, and general speech recognition scenarios where proper punctuation and capitalization are important.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026