whisper-large-persian

Maintained By
steja

Whisper Large Persian

PropertyValue
LicenseApache 2.0
Base Modelwhisper-large-v2-hi
Training DataMozilla Common Voice 11.0 (Persian)
WER Score26.37%

What is whisper-large-persian?

Whisper-large-persian is a specialized automatic speech recognition (ASR) model fine-tuned specifically for the Persian language. Based on OpenAI's Whisper architecture, this model has been optimized using the Mozilla Common Voice 11.0 Persian dataset to provide accurate transcription capabilities for Persian speech.

Implementation Details

The model was trained using a distributed multi-GPU setup with carefully tuned hyperparameters. Training was conducted over 1000 steps with a linear learning rate scheduler and warmup period. The implementation uses PyTorch and the Transformers library, demonstrating progressive improvement in performance throughout the training process.

  • Learning rate: 1e-05 with Adam optimizer
  • Batch size: 16 (total) with gradient accumulation
  • Training steps: 1000 with 100 warmup steps
  • Final validation loss: 0.3047

Core Capabilities

  • Persian speech recognition with 26.37% WER
  • Optimized for Mozilla Common Voice dataset
  • Progressive performance improvement (from 35.60% to 26.37% WER)
  • Suitable for production deployment via inference endpoints

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized optimization for Persian language ASR, achieving a competitive WER of 26.37% through careful fine-tuning of the Whisper large architecture.

Q: What are the recommended use cases?

The model is ideal for Persian speech transcription tasks, including content creation, subtitle generation, and voice command systems. It's particularly well-suited for applications requiring Persian language processing with reasonable accuracy requirements.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.