whisper-large-v3-russian

Maintained By
antony66

whisper-large-v3-russian

PropertyValue
Parameter Count1.54B
Model TypeAutomatic Speech Recognition
Tensor TypeBF16
LanguageRussian

What is whisper-large-v3-russian?

whisper-large-v3-russian is a specialized Russian language speech recognition model, fine-tuned from OpenAI's Whisper Large V3. This model represents a significant improvement in Russian ASR, reducing the Word Error Rate (WER) from 9.84 to 6.39 on the Common Voice 17.0 dataset. The model was extensively trained for over 60 hours on dual Tesla A100 80GB GPUs, making it particularly well-suited for Russian speech recognition tasks.

Implementation Details

The model is built upon the Whisper architecture and has been specifically optimized for Russian language processing. It utilizes the Common Voice 17.0 Russian dataset, comprising over 200,000 entries, with a 95/5 split for training and testing (225,761/11,883 rows). The model implements BF16 precision and is compatible with various hardware configurations, including CPU, CUDA, and MPS.

  • Built on Whisper Large V3 architecture with 1.54B parameters
  • Optimized for Russian language processing
  • Supports audio chunking with 30-second segments
  • Includes timestamp generation capabilities
  • Compatible with flash attention 2 for supported GPUs

Core Capabilities

  • High-accuracy Russian speech recognition
  • Optimized for phone call transcription
  • Batch processing support with customizable chunk sizes
  • Flexible deployment options across different computing platforms
  • Advanced audio preprocessing support for optimal recognition

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized optimization for Russian language processing, achieving a significantly lower WER compared to the base Whisper V3 model. Its extensive training on the Common Voice dataset makes it particularly effective for real-world Russian speech recognition tasks.

Q: What are the recommended use cases?

The model is particularly well-suited for phone call transcription, general Russian speech recognition, and applications requiring high-accuracy transcription. It's recommended to use audio preprocessing for optimal results, especially for telephone audio.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.