Whisper Tamil Medium

Property	Value
License	Apache 2.0
Base Model	OpenAI Whisper Medium
Training Framework	PyTorch
Primary Task	Automatic Speech Recognition

What is whisper-tamil-medium?

Whisper-tamil-medium is a specialized automatic speech recognition (ASR) model fine-tuned from OpenAI's Whisper-medium specifically for Tamil language processing. Developed at Speech Lab, IIT Madras, this model demonstrates impressive performance with a Word Error Rate (WER) of 6.5% on Common Voice test set and 6.97% on Google Fleurs test set.

Implementation Details

The model was trained using a comprehensive dataset combining multiple Tamil ASR corpuses including IISc-MILE, ULCA, Shrutilipi, Microsoft Speech Corpus, Google/Fleurs, and Babel ASR Corpus. Training utilized 8-bit AdamW optimizer with a linear learning rate scheduler, implementing mixed precision training for optimal performance.

Learning rate: 1e-05 with 17,500 warmup steps
Batch size: 24 (training) / 48 (evaluation)
Total training steps: 33,892
Mixed precision training enabled

Core Capabilities

High-accuracy Tamil speech recognition
Supports both CPU and GPU inference
Compatible with whisper-jax for faster inference
Optimized for production deployment

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its specialized fine-tuning for Tamil language using a diverse range of high-quality datasets and its impressive WER scores on standard benchmarks. It's also optimized for both accuracy and inference speed.

Q: What are the recommended use cases?

The model is ideal for Tamil speech transcription tasks, particularly in applications requiring high accuracy such as subtitling, content moderation, and speech analytics. It can be deployed in both research and production environments.