Whisper-OMG-2

Property	Value
Base Model	Whisper Enhanced ML
Training Dataset	Common Voice 11.0
Best WER	0.0266
Framework	PyTorch 2.5.0

What is whisper-omg-2?

Whisper-OMG-2 is a fine-tuned version of the Whisper Enhanced ML model, specifically optimized for speech recognition tasks. Developed by nurzhanit, this model demonstrates exceptional performance with a Word Error Rate (WER) of just 0.0266 on the evaluation set.

Implementation Details

The model was trained using the Adam optimizer with carefully tuned hyperparameters (β1=0.9, β2=0.999, ε=1e-08). The training process involved a linear learning rate scheduler with 50 warmup steps and continued for 500 training steps. The learning rate was set to 1e-05, with batch sizes of 16 for training and 8 for evaluation.

Achieved final validation loss of 0.0002
Implemented with Transformers 4.40.0
Uses PyTorch 2.5.0+cu124 backend

Core Capabilities

Superior speech recognition accuracy (0.0266 WER)
Efficient training convergence within 500 steps
Robust performance stability in later training epochs
Optimized for Common Voice dataset applications

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its exceptional WER of 0.0266, achieved through careful optimization and fine-tuning on the Common Voice 11.0 dataset. The training progression shows remarkable stability after step 300.

Q: What are the recommended use cases?

This model is particularly suited for speech recognition tasks, especially those utilizing Common Voice dataset-like audio inputs. It's optimized for production environments using PyTorch 2.5.0.

whisper-omg-2