Whisper Medium Korean
Property | Value |
---|---|
License | Apache 2.0 |
Dataset | Zeroth Korean |
WER Score | 3.64% |
CER Score | 1.48% |
What is whisper-medium-ko-zeroth?
Whisper-medium-ko-zeroth is a fine-tuned version of OpenAI's Whisper-medium model, specifically optimized for Korean speech recognition. Developed by seastar105, this model demonstrates exceptional performance on the Zeroth Korean dataset, achieving a Word Error Rate (WER) of 3.64% and Character Error Rate (CER) of 1.48%.
Implementation Details
The model was trained using a carefully orchestrated process with PyTorch, utilizing mixed-precision training with Native AMP. The training procedure involved 5000 steps with a linear learning rate scheduler, including 500 warmup steps. The optimization was performed using Adam with carefully tuned parameters (betas=0.9,0.999, epsilon=1e-08) and a learning rate of 5e-06.
- Batch size: 16 (effective, with gradient accumulation steps of 2)
- Training duration: 3.59 epochs
- Progressive improvement from initial WER of 7.75% to final 3.64%
Core Capabilities
- High-accuracy Korean speech recognition
- Optimized for the Zeroth Korean dataset
- Efficient processing with mixed-precision support
- Demonstrated stability in training with consistent performance improvements
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its exceptional performance on Korean speech recognition, achieving a remarkably low WER of 3.64% and CER of 1.48%. It's specifically optimized for Korean language processing, making it particularly effective for Korean ASR applications.
Q: What are the recommended use cases?
The model is ideal for Korean automatic speech recognition tasks, particularly in applications requiring high accuracy transcription. It's suitable for both academic research and production environments, given its Apache 2.0 license and robust performance metrics.