Whisper Telugu Large-v2

Property	Value
License	Apache 2.0
Language	Telugu
Base Model	Whisper Large-v2
WER Score	9.65

What is whisper-telugu-large-v2?

Whisper-telugu-large-v2 is a specialized automatic speech recognition (ASR) model fine-tuned from OpenAI's Whisper Large-v2 specifically for the Telugu language. Developed at Speech Lab, IIT Madras, this model represents a significant advancement in Telugu speech recognition, trained on an extensive collection of Telugu speech corpora including CSTD IIIT-H, ULCA, Shrutilipi, and Microsoft Speech Corpus.

Implementation Details

The model employs a sophisticated training approach with carefully tuned hyperparameters, including a learning rate of 0.75e-05, batch size of 8, and 75,000 training steps. It utilizes mixed precision training with the AdamW optimizer and implements a linear learning rate scheduler with 22,000 warmup steps.

Supports both PyTorch and JAX-based inference
Optimized for 30-second audio chunks
Includes specialized decoder prompts for Telugu language
Implements 8-bit optimization for improved efficiency

Core Capabilities

Achieves 9.65 WER on Google FLEURS test set
Handles diverse Telugu speech patterns and accents
Supports batch processing for faster inference
Compatible with both CPU and GPU environments

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its comprehensive training on multiple Telugu speech corpora and its optimization for production environments, supporting both PyTorch and JAX-based inference pipelines. The achieved WER of 9.65 demonstrates its high accuracy in Telugu speech recognition.

Q: What are the recommended use cases?

The model is ideal for Telugu speech transcription tasks, particularly in applications requiring high accuracy and processing of longer audio segments. It's suitable for both research and production environments, with flexible deployment options using either PyTorch or JAX.