Whisper Medium Urdu
Property | Value |
---|---|
Parameter Count | 764M |
License | Apache 2.0 |
Framework | PyTorch |
WER Score | 26.98% |
What is whisper-medium-urdu?
Whisper-medium-urdu is a specialized automatic speech recognition (ASR) model fine-tuned from OpenAI's Whisper medium model for the Urdu language. This model represents a significant advancement in Urdu language processing, trained on the Mozilla Common Voice dataset version 11.0.
Implementation Details
The model utilizes a transformer-based architecture with 764M parameters, implemented in PyTorch. Training was conducted using mixed-precision training with Native AMP, employing the Adam optimizer with carefully tuned hyperparameters (β1=0.9, β2=0.999, ε=1e-08).
- Learning rate: 1e-05 with linear scheduler
- Batch sizes: 32 (training) and 16 (evaluation)
- Training steps: 300 with 40 warmup steps
- Best validation loss: 0.4685
Core Capabilities
- Specialized Urdu speech recognition
- State-of-the-art WER of 26.98% on test set
- Efficient processing with F32 tensor type
- Production-ready with TensorBoard support
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for Urdu language speech recognition, achieving impressive accuracy with a WER of 26.98%. It represents a significant improvement over generic speech recognition models when applied to Urdu content.
Q: What are the recommended use cases?
The model is ideal for Urdu speech transcription tasks, including but not limited to: automated subtitling, voice command systems, and speech-to-text applications focused on Urdu language content.