wav2vec2-mbart50-ru

Maintained By
bond005

wav2vec2-mbart50-ru

PropertyValue
LicenseApache 2.0
ArchitectureSpeech Encoder-Decoder
Primary TaskRussian Speech Recognition
AuthorIvan Bondarenko

What is wav2vec2-mbart50-ru?

wav2vec2-mbart50-ru is an advanced speech-to-text model specifically designed for Russian language processing. It combines Wav2Vec2-Large-Ru-Golos as the encoder and mBART-large-50 as the decoder, creating a powerful speech recognition system that can handle not just basic transcription but also proper punctuation and capitalization.

Implementation Details

The model was trained on multiple Russian speech datasets, including SberDevices Golos, Common Voice 6.0, Sova RuDevices, and Russian Librispeech. It requires 16kHz audio input and demonstrates impressive Word Error Rates (WER) ranging from 13.2% to 32.5% across different test sets.

  • Utilizes SpeechEncoderDecoderModel architecture
  • Trained on 5 different Russian speech datasets
  • Supports automatic punctuation and capitalization
  • Processes 16kHz audio input

Core Capabilities

  • Accurate Russian speech recognition with WER as low as 13.2%
  • Automatic text enhancement with proper punctuation
  • Handles various speech conditions (crowd, farfield)
  • Production-ready with PyTorch implementation

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its combination of Wav2Vec2 and mBART-50 architectures, allowing it to not only transcribe speech but also add proper punctuation and capitalization automatically. It's been trained on diverse Russian speech datasets, making it robust across different speaking conditions.

Q: What are the recommended use cases?

The model is ideal for Russian speech transcription tasks requiring high accuracy and proper formatting. It's particularly effective for applications in crowd-sourced audio, farfield recordings, and general speech recognition scenarios where proper punctuation and capitalization are important.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.