wav2vec2-mbart50-ru
Property | Value |
---|---|
License | Apache 2.0 |
Architecture | Speech Encoder-Decoder |
Primary Task | Russian Speech Recognition |
Author | Ivan Bondarenko |
What is wav2vec2-mbart50-ru?
wav2vec2-mbart50-ru is an advanced speech-to-text model specifically designed for Russian language processing. It combines Wav2Vec2-Large-Ru-Golos as the encoder and mBART-large-50 as the decoder, creating a powerful speech recognition system that can handle not just basic transcription but also proper punctuation and capitalization.
Implementation Details
The model was trained on multiple Russian speech datasets, including SberDevices Golos, Common Voice 6.0, Sova RuDevices, and Russian Librispeech. It requires 16kHz audio input and demonstrates impressive Word Error Rates (WER) ranging from 13.2% to 32.5% across different test sets.
- Utilizes SpeechEncoderDecoderModel architecture
- Trained on 5 different Russian speech datasets
- Supports automatic punctuation and capitalization
- Processes 16kHz audio input
Core Capabilities
- Accurate Russian speech recognition with WER as low as 13.2%
- Automatic text enhancement with proper punctuation
- Handles various speech conditions (crowd, farfield)
- Production-ready with PyTorch implementation
Frequently Asked Questions
Q: What makes this model unique?
The model's unique strength lies in its combination of Wav2Vec2 and mBART-50 architectures, allowing it to not only transcribe speech but also add proper punctuation and capitalization automatically. It's been trained on diverse Russian speech datasets, making it robust across different speaking conditions.
Q: What are the recommended use cases?
The model is ideal for Russian speech transcription tasks requiring high accuracy and proper formatting. It's particularly effective for applications in crowd-sourced audio, farfield recordings, and general speech recognition scenarios where proper punctuation and capitalization are important.