XLS-R Uzbek Speech Recognition Model
Property | Value |
---|---|
Base Model | facebook/wav2vec2-xls-r-300m |
Training Dataset | Mozilla Common Voice 8.0 - UZ |
WER Score | 38.52% |
CER Score | 7.77% |
Model Hub | Hugging Face |
What is xls-r-uzbek-cv8?
XLS-R-uzbek-cv8 is a specialized speech recognition model fine-tuned for the Uzbek language. Built upon Facebook's wav2vec2-xls-r-300m architecture, it incorporates a KenLM language model and is specifically designed to handle Modern Latin Uzbek alphabet transcription. The model achieves a Word Error Rate (WER) of 38.52% and Character Error Rate (CER) of 7.77% on validation data.
Implementation Details
The model was trained using a comprehensive approach with the following key specifications: Adam optimizer with learning rate 3e-05, batch size of 128, and 100 epochs of training. The implementation features native AMP mixed precision training and a linear learning rate scheduler with 500 warmup steps.
- Vocabulary optimized for Modern Latin Uzbek alphabet
- KenLM language model integration
- Custom handling of specific Uzbek characters like <'> and <'>
- Trained on 50% of Common Voice official split
Core Capabilities
- Automatic Speech Recognition for Uzbek audio
- Draft video caption generation
- Broadcast content indexing
- Basic transcription services
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for Uzbek language speech recognition, with custom vocabulary handling and language model integration. It's one of the few models specifically trained for Uzbek ASR tasks.
Q: What are the recommended use cases?
The model is best suited for draft video captioning and broadcast content indexing. It's not recommended for live captioning or accessibility purposes due to accuracy limitations. Users should also respect privacy considerations regarding Common Voice dataset contributors.