Whisper Medium Korean

Property	Value
License	Apache 2.0
Dataset	Zeroth Korean
WER Score	3.64%
CER Score	1.48%

What is whisper-medium-ko-zeroth?

Whisper-medium-ko-zeroth is a fine-tuned version of OpenAI's Whisper-medium model, specifically optimized for Korean speech recognition. Developed by seastar105, this model demonstrates exceptional performance on the Zeroth Korean dataset, achieving a Word Error Rate (WER) of 3.64% and Character Error Rate (CER) of 1.48%.

Implementation Details

The model was trained using a carefully orchestrated process with PyTorch, utilizing mixed-precision training with Native AMP. The training procedure involved 5000 steps with a linear learning rate scheduler, including 500 warmup steps. The optimization was performed using Adam with carefully tuned parameters (betas=0.9,0.999, epsilon=1e-08) and a learning rate of 5e-06.

Batch size: 16 (effective, with gradient accumulation steps of 2)
Training duration: 3.59 epochs
Progressive improvement from initial WER of 7.75% to final 3.64%

Core Capabilities

High-accuracy Korean speech recognition
Optimized for the Zeroth Korean dataset
Efficient processing with mixed-precision support
Demonstrated stability in training with consistent performance improvements

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its exceptional performance on Korean speech recognition, achieving a remarkably low WER of 3.64% and CER of 1.48%. It's specifically optimized for Korean language processing, making it particularly effective for Korean ASR applications.

Q: What are the recommended use cases?

The model is ideal for Korean automatic speech recognition tasks, particularly in applications requiring high accuracy transcription. It's suitable for both academic research and production environments, given its Apache 2.0 license and robust performance metrics.

whisper-medium-ko-zeroth