xls-r-uzbek-cv8

lucio

XLS-R-300M model fine-tuned for Uzbek speech recognition, achieving 38.52% WER. Built on Common Voice 8.0 with KenLM language modeling.

Property	Value
Base Model	facebook/wav2vec2-xls-r-300m
Training Dataset	Mozilla Common Voice 8.0 - UZ
WER Score	38.52%
CER Score	7.77%
Model Hub	Hugging Face

What is xls-r-uzbek-cv8?

XLS-R-uzbek-cv8 is a specialized speech recognition model fine-tuned for the Uzbek language. Built upon Facebook's wav2vec2-xls-r-300m architecture, it incorporates a KenLM language model and is specifically designed to handle Modern Latin Uzbek alphabet transcription. The model achieves a Word Error Rate (WER) of 38.52% and Character Error Rate (CER) of 7.77% on validation data.

Implementation Details

The model was trained using a comprehensive approach with the following key specifications: Adam optimizer with learning rate 3e-05, batch size of 128, and 100 epochs of training. The implementation features native AMP mixed precision training and a linear learning rate scheduler with 500 warmup steps.

Vocabulary optimized for Modern Latin Uzbek alphabet
KenLM language model integration
Custom handling of specific Uzbek characters like <'> and <'>
Trained on 50% of Common Voice official split

Core Capabilities

Automatic Speech Recognition for Uzbek audio
Draft video caption generation
Broadcast content indexing
Basic transcription services

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Uzbek language speech recognition, with custom vocabulary handling and language model integration. It's one of the few models specifically trained for Uzbek ASR tasks.

Q: What are the recommended use cases?

The model is best suited for draft video captioning and broadcast content indexing. It's not recommended for live captioning or accessibility purposes due to accuracy limitations. Users should also respect privacy considerations regarding Common Voice dataset contributors.