whisper-omg-2

whisper-omg-2

nurzhanit

Fine-tuned Whisper model achieving 0.0266 WER on Common Voice 11.0, trained with Adam optimizer over 500 steps and linear learning rate scheduling

PropertyValue
Base ModelWhisper Enhanced ML
Training DatasetCommon Voice 11.0
Best WER0.0266
FrameworkPyTorch 2.5.0

What is whisper-omg-2?

Whisper-OMG-2 is a fine-tuned version of the Whisper Enhanced ML model, specifically optimized for speech recognition tasks. Developed by nurzhanit, this model demonstrates exceptional performance with a Word Error Rate (WER) of just 0.0266 on the evaluation set.

Implementation Details

The model was trained using the Adam optimizer with carefully tuned hyperparameters (β1=0.9, β2=0.999, ε=1e-08). The training process involved a linear learning rate scheduler with 50 warmup steps and continued for 500 training steps. The learning rate was set to 1e-05, with batch sizes of 16 for training and 8 for evaluation.

  • Achieved final validation loss of 0.0002
  • Implemented with Transformers 4.40.0
  • Uses PyTorch 2.5.0+cu124 backend

Core Capabilities

  • Superior speech recognition accuracy (0.0266 WER)
  • Efficient training convergence within 500 steps
  • Robust performance stability in later training epochs
  • Optimized for Common Voice dataset applications

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its exceptional WER of 0.0266, achieved through careful optimization and fine-tuning on the Common Voice 11.0 dataset. The training progression shows remarkable stability after step 300.

Q: What are the recommended use cases?

This model is particularly suited for speech recognition tasks, especially those utilizing Common Voice dataset-like audio inputs. It's optimized for production environments using PyTorch 2.5.0.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026