whisper-large-v3-ru-podlodka

bond005

Russian speech recognition model based on Whisper Large V3, fine-tuned on Russian datasets with 1.54B parameters, achieving ~10% WER without punctuation.

Property	Value
Parameter Count	1.54B
License	Apache 2.0
Tensor Type	F32
Author	bond005

What is whisper-large-v3-ru-podlodka?

This is a specialized Russian speech recognition model based on OpenAI's Whisper Large V3 architecture, fine-tuned specifically for Russian language processing. The model has been optimized using multiple Russian datasets including Taiga Speech V2, Podlodka Speech, and Russian LibriSpeech.

Implementation Details

The model implements a transformer-based architecture with 1.54 billion parameters, utilizing PyTorch and Safetensors for efficient processing. It achieves impressive Word Error Rates (WER) of 10.987% without punctuation on the Podlodka.io dataset and 9.795% on Russian LibriSpeech.

Trained on multiple high-quality Russian speech datasets
Optimized for both clean and noisy audio conditions
Supports punctuation and capitalization in transcriptions
Implements state-of-the-art Whisper V3 architecture

Core Capabilities

High-accuracy Russian speech recognition
Robust performance in various acoustic conditions
Support for punctuation and case-sensitive transcription
Achieves ~21% WER with punctuation and ~11% WER without punctuation

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its specialized optimization for Russian language processing and its comprehensive training on multiple Russian speech datasets, making it particularly effective for Russian speech recognition tasks.

Q: What are the recommended use cases?

The model is ideal for Russian speech transcription tasks, particularly in scenarios requiring high accuracy such as podcast transcription, meeting recordings, and general speech-to-text applications where Russian language support is crucial.