whisper-tamil-medium

Maintained By
vasista22

Whisper Tamil Medium

PropertyValue
LicenseApache 2.0
Base ModelOpenAI Whisper Medium
Training FrameworkPyTorch
Primary TaskAutomatic Speech Recognition

What is whisper-tamil-medium?

Whisper-tamil-medium is a specialized automatic speech recognition (ASR) model fine-tuned from OpenAI's Whisper-medium specifically for Tamil language processing. Developed at Speech Lab, IIT Madras, this model demonstrates impressive performance with a Word Error Rate (WER) of 6.5% on Common Voice test set and 6.97% on Google Fleurs test set.

Implementation Details

The model was trained using a comprehensive dataset combining multiple Tamil ASR corpuses including IISc-MILE, ULCA, Shrutilipi, Microsoft Speech Corpus, Google/Fleurs, and Babel ASR Corpus. Training utilized 8-bit AdamW optimizer with a linear learning rate scheduler, implementing mixed precision training for optimal performance.

  • Learning rate: 1e-05 with 17,500 warmup steps
  • Batch size: 24 (training) / 48 (evaluation)
  • Total training steps: 33,892
  • Mixed precision training enabled

Core Capabilities

  • High-accuracy Tamil speech recognition
  • Supports both CPU and GPU inference
  • Compatible with whisper-jax for faster inference
  • Optimized for production deployment

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its specialized fine-tuning for Tamil language using a diverse range of high-quality datasets and its impressive WER scores on standard benchmarks. It's also optimized for both accuracy and inference speed.

Q: What are the recommended use cases?

The model is ideal for Tamil speech transcription tasks, particularly in applications requiring high accuracy such as subtitling, content moderation, and speech analytics. It can be deployed in both research and production environments.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.