Whisper Medium Portuguese

Property	Value
License	Apache 2.0
Training Dataset	Common Voice 11.0
Best WER Score	6.59%
Training Steps	6000

What is whisper-medium-portuguese?

Whisper Medium Portuguese is a fine-tuned version of OpenAI's Whisper Medium model, specifically optimized for Portuguese speech recognition. This model achieves state-of-the-art performance with a Word Error Rate (WER) of 6.59%, surpassing both the original Whisper Medium (8.1% WER) and even Whisper Large (7.1% WER) on Portuguese transcription tasks.

Implementation Details

The model was trained using a carefully tuned configuration with Adam optimizer, linear learning rate scheduling, and mixed precision training. Key training parameters include a learning rate of 9e-06, batch size of 32, and 6000 training steps with 500 warmup steps.

Native AMP (Automatic Mixed Precision) training implementation
Trained on Mozilla Common Voice 11.0 dataset
Achieved optimal performance at epoch 5.05 with validation loss of 0.2628

Core Capabilities

State-of-the-art Portuguese speech recognition
Robust performance on varied Portuguese audio inputs
Significantly improved accuracy compared to base Whisper models
Optimized for production deployment

Frequently Asked Questions

Q: What makes this model unique?

This model achieves better Portuguese transcription accuracy than both Whisper Medium and Large models, with a WER of 6.59% compared to the original 8.1%, making it the current SOTA for Portuguese ASR.

Q: What are the recommended use cases?

The model is ideal for Portuguese speech transcription tasks, including subtitle generation, audio content indexing, and voice command systems requiring high accuracy in Portuguese language processing.