whisper-large-v3-turbo-russian
Property | Value |
---|---|
Author | dvislobokov |
Training Dataset | Mozilla Common Voice 17 (118k samples) |
Training Infrastructure | 2x A100 40GB GPUs, 128GB RAM, 2x Xeon 48 Core |
Training Time | ~7 hours |
Model Base | Whisper Large V3 |
What is whisper-large-v3-turbo-russian?
whisper-large-v3-turbo-russian is a specialized speech recognition model fine-tuned specifically for Russian language transcription. Built upon OpenAI's Whisper Large V3 architecture, this model has been optimized using a substantial dataset of 118,000 audio samples from Mozilla Common Voice 17.
Implementation Details
The model was trained using high-performance computing infrastructure, including two NVIDIA A100 40GB GPUs, 128GB RAM, and dual Xeon 48-Core 2.4 GHz processors. The training process was completed in approximately 7 hours, demonstrating efficient utilization of computational resources.
- Built on Whisper Large V3 architecture
- Trained on 118k Russian language audio samples
- Optimized for CPU and GPU deployment
- Includes timestamp generation capability
Core Capabilities
- Russian speech-to-text transcription
- Timestamp generation for transcribed text
- Compatible with both microphone input and audio file upload
- Deployable on CPU for accessibility
Frequently Asked Questions
Q: What makes this model unique?
This model combines the robust capabilities of Whisper Large V3 with specialized training for Russian language processing, making it particularly effective for Russian speech recognition tasks. The training on Mozilla Common Voice dataset ensures broad coverage of different speech patterns and accents.
Q: What are the recommended use cases?
The model is ideal for Russian speech transcription applications, including real-time transcription from microphone input and batch processing of audio files. It's suitable for both production environments and research applications, with flexible deployment options on either CPU or GPU.