whisper-medium-urdu

Maintained By
ihanif

Whisper Medium Urdu

PropertyValue
Parameter Count764M
LicenseApache 2.0
FrameworkPyTorch
WER Score26.98%

What is whisper-medium-urdu?

Whisper-medium-urdu is a specialized automatic speech recognition (ASR) model fine-tuned from OpenAI's Whisper medium model for the Urdu language. This model represents a significant advancement in Urdu language processing, trained on the Mozilla Common Voice dataset version 11.0.

Implementation Details

The model utilizes a transformer-based architecture with 764M parameters, implemented in PyTorch. Training was conducted using mixed-precision training with Native AMP, employing the Adam optimizer with carefully tuned hyperparameters (β1=0.9, β2=0.999, ε=1e-08).

  • Learning rate: 1e-05 with linear scheduler
  • Batch sizes: 32 (training) and 16 (evaluation)
  • Training steps: 300 with 40 warmup steps
  • Best validation loss: 0.4685

Core Capabilities

  • Specialized Urdu speech recognition
  • State-of-the-art WER of 26.98% on test set
  • Efficient processing with F32 tensor type
  • Production-ready with TensorBoard support

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Urdu language speech recognition, achieving impressive accuracy with a WER of 26.98%. It represents a significant improvement over generic speech recognition models when applied to Urdu content.

Q: What are the recommended use cases?

The model is ideal for Urdu speech transcription tasks, including but not limited to: automated subtitling, voice command systems, and speech-to-text applications focused on Urdu language content.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.