xls-r-uzbek-cv8

Maintained By
lucio

XLS-R Uzbek Speech Recognition Model

PropertyValue
Base Modelfacebook/wav2vec2-xls-r-300m
Training DatasetMozilla Common Voice 8.0 - UZ
WER Score38.52%
CER Score7.77%
Model HubHugging Face

What is xls-r-uzbek-cv8?

XLS-R-uzbek-cv8 is a specialized speech recognition model fine-tuned for the Uzbek language. Built upon Facebook's wav2vec2-xls-r-300m architecture, it incorporates a KenLM language model and is specifically designed to handle Modern Latin Uzbek alphabet transcription. The model achieves a Word Error Rate (WER) of 38.52% and Character Error Rate (CER) of 7.77% on validation data.

Implementation Details

The model was trained using a comprehensive approach with the following key specifications: Adam optimizer with learning rate 3e-05, batch size of 128, and 100 epochs of training. The implementation features native AMP mixed precision training and a linear learning rate scheduler with 500 warmup steps.

  • Vocabulary optimized for Modern Latin Uzbek alphabet
  • KenLM language model integration
  • Custom handling of specific Uzbek characters like <'> and <'>
  • Trained on 50% of Common Voice official split

Core Capabilities

  • Automatic Speech Recognition for Uzbek audio
  • Draft video caption generation
  • Broadcast content indexing
  • Basic transcription services

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Uzbek language speech recognition, with custom vocabulary handling and language model integration. It's one of the few models specifically trained for Uzbek ASR tasks.

Q: What are the recommended use cases?

The model is best suited for draft video captioning and broadcast content indexing. It's not recommended for live captioning or accessibility purposes due to accuracy limitations. Users should also respect privacy considerations regarding Common Voice dataset contributors.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.