Whisper Medium Urdu

Property	Value
Parameter Count	764M
License	Apache 2.0
Framework	PyTorch
WER Score	26.98%

What is whisper-medium-urdu?

Whisper-medium-urdu is a specialized automatic speech recognition (ASR) model fine-tuned from OpenAI's Whisper medium model for the Urdu language. This model represents a significant advancement in Urdu language processing, trained on the Mozilla Common Voice dataset version 11.0.

Implementation Details

The model utilizes a transformer-based architecture with 764M parameters, implemented in PyTorch. Training was conducted using mixed-precision training with Native AMP, employing the Adam optimizer with carefully tuned hyperparameters (β1=0.9, β2=0.999, ε=1e-08).

Learning rate: 1e-05 with linear scheduler
Batch sizes: 32 (training) and 16 (evaluation)
Training steps: 300 with 40 warmup steps
Best validation loss: 0.4685

Core Capabilities

Specialized Urdu speech recognition
State-of-the-art WER of 26.98% on test set
Efficient processing with F32 tensor type
Production-ready with TensorBoard support

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Urdu language speech recognition, achieving impressive accuracy with a WER of 26.98%. It represents a significant improvement over generic speech recognition models when applied to Urdu content.

Q: What are the recommended use cases?

The model is ideal for Urdu speech transcription tasks, including but not limited to: automated subtitling, voice command systems, and speech-to-text applications focused on Urdu language content.