whisper-medium-urdu

whisper-medium-urdu

ihanif

A fine-tuned Whisper medium model for Urdu speech recognition, achieving 26.98% WER on Common Voice, with 764M parameters and Apache 2.0 license.

PropertyValue
Parameter Count764M
LicenseApache 2.0
FrameworkPyTorch
WER Score26.98%

What is whisper-medium-urdu?

Whisper-medium-urdu is a specialized automatic speech recognition (ASR) model fine-tuned from OpenAI's Whisper medium model for the Urdu language. This model represents a significant advancement in Urdu language processing, trained on the Mozilla Common Voice dataset version 11.0.

Implementation Details

The model utilizes a transformer-based architecture with 764M parameters, implemented in PyTorch. Training was conducted using mixed-precision training with Native AMP, employing the Adam optimizer with carefully tuned hyperparameters (β1=0.9, β2=0.999, ε=1e-08).

  • Learning rate: 1e-05 with linear scheduler
  • Batch sizes: 32 (training) and 16 (evaluation)
  • Training steps: 300 with 40 warmup steps
  • Best validation loss: 0.4685

Core Capabilities

  • Specialized Urdu speech recognition
  • State-of-the-art WER of 26.98% on test set
  • Efficient processing with F32 tensor type
  • Production-ready with TensorBoard support

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Urdu language speech recognition, achieving impressive accuracy with a WER of 26.98%. It represents a significant improvement over generic speech recognition models when applied to Urdu content.

Q: What are the recommended use cases?

The model is ideal for Urdu speech transcription tasks, including but not limited to: automated subtitling, voice command systems, and speech-to-text applications focused on Urdu language content.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026