whisper-large-v3-russian
Property | Value |
---|---|
Parameter Count | 1.54B |
Model Type | Automatic Speech Recognition |
Tensor Type | BF16 |
Language | Russian |
What is whisper-large-v3-russian?
whisper-large-v3-russian is a specialized Russian language speech recognition model, fine-tuned from OpenAI's Whisper Large V3. This model represents a significant improvement in Russian ASR, reducing the Word Error Rate (WER) from 9.84 to 6.39 on the Common Voice 17.0 dataset. The model was extensively trained for over 60 hours on dual Tesla A100 80GB GPUs, making it particularly well-suited for Russian speech recognition tasks.
Implementation Details
The model is built upon the Whisper architecture and has been specifically optimized for Russian language processing. It utilizes the Common Voice 17.0 Russian dataset, comprising over 200,000 entries, with a 95/5 split for training and testing (225,761/11,883 rows). The model implements BF16 precision and is compatible with various hardware configurations, including CPU, CUDA, and MPS.
- Built on Whisper Large V3 architecture with 1.54B parameters
- Optimized for Russian language processing
- Supports audio chunking with 30-second segments
- Includes timestamp generation capabilities
- Compatible with flash attention 2 for supported GPUs
Core Capabilities
- High-accuracy Russian speech recognition
- Optimized for phone call transcription
- Batch processing support with customizable chunk sizes
- Flexible deployment options across different computing platforms
- Advanced audio preprocessing support for optimal recognition
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized optimization for Russian language processing, achieving a significantly lower WER compared to the base Whisper V3 model. Its extensive training on the Common Voice dataset makes it particularly effective for real-world Russian speech recognition tasks.
Q: What are the recommended use cases?
The model is particularly well-suited for phone call transcription, general Russian speech recognition, and applications requiring high-accuracy transcription. It's recommended to use audio preprocessing for optimal results, especially for telephone audio.