Whisper Telugu Large-v2
Property | Value |
---|---|
License | Apache 2.0 |
Language | Telugu |
Base Model | Whisper Large-v2 |
WER Score | 9.65 |
What is whisper-telugu-large-v2?
Whisper-telugu-large-v2 is a specialized automatic speech recognition (ASR) model fine-tuned from OpenAI's Whisper Large-v2 specifically for the Telugu language. Developed at Speech Lab, IIT Madras, this model represents a significant advancement in Telugu speech recognition, trained on an extensive collection of Telugu speech corpora including CSTD IIIT-H, ULCA, Shrutilipi, and Microsoft Speech Corpus.
Implementation Details
The model employs a sophisticated training approach with carefully tuned hyperparameters, including a learning rate of 0.75e-05, batch size of 8, and 75,000 training steps. It utilizes mixed precision training with the AdamW optimizer and implements a linear learning rate scheduler with 22,000 warmup steps.
- Supports both PyTorch and JAX-based inference
- Optimized for 30-second audio chunks
- Includes specialized decoder prompts for Telugu language
- Implements 8-bit optimization for improved efficiency
Core Capabilities
- Achieves 9.65 WER on Google FLEURS test set
- Handles diverse Telugu speech patterns and accents
- Supports batch processing for faster inference
- Compatible with both CPU and GPU environments
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its comprehensive training on multiple Telugu speech corpora and its optimization for production environments, supporting both PyTorch and JAX-based inference pipelines. The achieved WER of 9.65 demonstrates its high accuracy in Telugu speech recognition.
Q: What are the recommended use cases?
The model is ideal for Telugu speech transcription tasks, particularly in applications requiring high accuracy and processing of longer audio segments. It's suitable for both research and production environments, with flexible deployment options using either PyTorch or JAX.