Whisper Tamil Medium
Property | Value |
---|---|
License | Apache 2.0 |
Base Model | OpenAI Whisper Medium |
Training Framework | PyTorch |
Primary Task | Automatic Speech Recognition |
What is whisper-tamil-medium?
Whisper-tamil-medium is a specialized automatic speech recognition (ASR) model fine-tuned from OpenAI's Whisper-medium specifically for Tamil language processing. Developed at Speech Lab, IIT Madras, this model demonstrates impressive performance with a Word Error Rate (WER) of 6.5% on Common Voice test set and 6.97% on Google Fleurs test set.
Implementation Details
The model was trained using a comprehensive dataset combining multiple Tamil ASR corpuses including IISc-MILE, ULCA, Shrutilipi, Microsoft Speech Corpus, Google/Fleurs, and Babel ASR Corpus. Training utilized 8-bit AdamW optimizer with a linear learning rate scheduler, implementing mixed precision training for optimal performance.
- Learning rate: 1e-05 with 17,500 warmup steps
- Batch size: 24 (training) / 48 (evaluation)
- Total training steps: 33,892
- Mixed precision training enabled
Core Capabilities
- High-accuracy Tamil speech recognition
- Supports both CPU and GPU inference
- Compatible with whisper-jax for faster inference
- Optimized for production deployment
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its specialized fine-tuning for Tamil language using a diverse range of high-quality datasets and its impressive WER scores on standard benchmarks. It's also optimized for both accuracy and inference speed.
Q: What are the recommended use cases?
The model is ideal for Tamil speech transcription tasks, particularly in applications requiring high accuracy such as subtitling, content moderation, and speech analytics. It can be deployed in both research and production environments.