Whisper Large Persian
Property | Value |
---|---|
License | Apache 2.0 |
Base Model | whisper-large-v2-hi |
Training Data | Mozilla Common Voice 11.0 (Persian) |
WER Score | 26.37% |
What is whisper-large-persian?
Whisper-large-persian is a specialized automatic speech recognition (ASR) model fine-tuned specifically for the Persian language. Based on OpenAI's Whisper architecture, this model has been optimized using the Mozilla Common Voice 11.0 Persian dataset to provide accurate transcription capabilities for Persian speech.
Implementation Details
The model was trained using a distributed multi-GPU setup with carefully tuned hyperparameters. Training was conducted over 1000 steps with a linear learning rate scheduler and warmup period. The implementation uses PyTorch and the Transformers library, demonstrating progressive improvement in performance throughout the training process.
- Learning rate: 1e-05 with Adam optimizer
- Batch size: 16 (total) with gradient accumulation
- Training steps: 1000 with 100 warmup steps
- Final validation loss: 0.3047
Core Capabilities
- Persian speech recognition with 26.37% WER
- Optimized for Mozilla Common Voice dataset
- Progressive performance improvement (from 35.60% to 26.37% WER)
- Suitable for production deployment via inference endpoints
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized optimization for Persian language ASR, achieving a competitive WER of 26.37% through careful fine-tuning of the Whisper large architecture.
Q: What are the recommended use cases?
The model is ideal for Persian speech transcription tasks, including content creation, subtitle generation, and voice command systems. It's particularly well-suited for applications requiring Persian language processing with reasonable accuracy requirements.