Whisper Medium Portuguese
Property | Value |
---|---|
License | Apache 2.0 |
Training Dataset | Common Voice 11.0 |
Best WER Score | 6.59% |
Training Steps | 6000 |
What is whisper-medium-portuguese?
Whisper Medium Portuguese is a fine-tuned version of OpenAI's Whisper Medium model, specifically optimized for Portuguese speech recognition. This model achieves state-of-the-art performance with a Word Error Rate (WER) of 6.59%, surpassing both the original Whisper Medium (8.1% WER) and even Whisper Large (7.1% WER) on Portuguese transcription tasks.
Implementation Details
The model was trained using a carefully tuned configuration with Adam optimizer, linear learning rate scheduling, and mixed precision training. Key training parameters include a learning rate of 9e-06, batch size of 32, and 6000 training steps with 500 warmup steps.
- Native AMP (Automatic Mixed Precision) training implementation
- Trained on Mozilla Common Voice 11.0 dataset
- Achieved optimal performance at epoch 5.05 with validation loss of 0.2628
Core Capabilities
- State-of-the-art Portuguese speech recognition
- Robust performance on varied Portuguese audio inputs
- Significantly improved accuracy compared to base Whisper models
- Optimized for production deployment
Frequently Asked Questions
Q: What makes this model unique?
This model achieves better Portuguese transcription accuracy than both Whisper Medium and Large models, with a WER of 6.59% compared to the original 8.1%, making it the current SOTA for Portuguese ASR.
Q: What are the recommended use cases?
The model is ideal for Portuguese speech transcription tasks, including subtitle generation, audio content indexing, and voice command systems requiring high accuracy in Portuguese language processing.