Whisper Large Persian

Property	Value
License	Apache 2.0
Base Model	whisper-large-v2-hi
Training Data	Mozilla Common Voice 11.0 (Persian)
WER Score	26.37%

What is whisper-large-persian?

Whisper-large-persian is a specialized automatic speech recognition (ASR) model fine-tuned specifically for the Persian language. Based on OpenAI's Whisper architecture, this model has been optimized using the Mozilla Common Voice 11.0 Persian dataset to provide accurate transcription capabilities for Persian speech.

Implementation Details

The model was trained using a distributed multi-GPU setup with carefully tuned hyperparameters. Training was conducted over 1000 steps with a linear learning rate scheduler and warmup period. The implementation uses PyTorch and the Transformers library, demonstrating progressive improvement in performance throughout the training process.

Learning rate: 1e-05 with Adam optimizer
Batch size: 16 (total) with gradient accumulation
Training steps: 1000 with 100 warmup steps
Final validation loss: 0.3047

Core Capabilities

Persian speech recognition with 26.37% WER
Optimized for Mozilla Common Voice dataset
Progressive performance improvement (from 35.60% to 26.37% WER)
Suitable for production deployment via inference endpoints

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized optimization for Persian language ASR, achieving a competitive WER of 26.37% through careful fine-tuning of the Whisper large architecture.

Q: What are the recommended use cases?

The model is ideal for Persian speech transcription tasks, including content creation, subtitle generation, and voice command systems. It's particularly well-suited for applications requiring Persian language processing with reasonable accuracy requirements.