whisper-base-fine_tuned-ru

Maintained By
artyomboyko

whisper-base-fine_tuned-ru

PropertyValue
Parameter Count72.6M
LicenseApache 2.0
Base Modelopenai/whisper-base
WER Score41.22%

What is whisper-base-fine_tuned-ru?

whisper-base-fine_tuned-ru is a specialized automatic speech recognition (ASR) model optimized for the Russian language. Built upon OpenAI's Whisper base architecture, this model has been fine-tuned using the Mozilla Common Voice 11.0 dataset to enhance its performance specifically for Russian speech transcription.

Implementation Details

The model utilizes a transformer-based architecture with 72.6M parameters and F32 tensor precision. Fine-tuning was conducted using PyTorch with Native AMP mixed precision training, implementing an Adam optimizer with carefully tuned hyperparameters (β1=0.9, β2=0.999, ε=1e-08).

  • Training batch size: 16 (4 base × 4 gradient accumulation steps)
  • Learning rate: 1e-06 with linear scheduler
  • Training steps: 20,000 with 250 warmup steps
  • Achieved final validation loss: 0.4553

Core Capabilities

  • Russian speech recognition with 41.22% WER
  • Optimized for Russian language audio transcription
  • Compatible with standard Whisper inference pipelines
  • Supports TensorBoard logging for monitoring

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Russian language ASR, showing significant improvement through extensive fine-tuning on Russian speech data, starting from a 71.67% WER and improving to 41.22% WER through training.

Q: What are the recommended use cases?

The model is best suited for Russian speech transcription tasks, particularly in applications requiring automatic subtitling, transcription services, or voice command systems for Russian-speaking users.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.