wav2vec2-large-xls-r-300m-ha-cv8
Property | Value |
---|---|
License | Apache 2.0 |
Training Dataset | Common Voice 8 (Hausa) |
Best WER | 36.295% (with LM) |
Framework | PyTorch 1.10.0 |
What is wav2vec2-large-xls-r-300m-ha-cv8?
This model is a specialized speech recognition system fine-tuned for the Hausa language, based on Facebook's wav2vec2-xls-r-300m architecture. It represents a significant advancement in African language processing, achieving a Word Error Rate (WER) of 36.295% with language model integration.
Implementation Details
The model was trained using a sophisticated approach with 100 epochs, utilizing the Adam optimizer and a cosine learning rate scheduler with warmup steps. Training was conducted with a batch size of 32 and gradient accumulation steps of 2, demonstrating robust optimization strategies.
- Learning rate: 0.0001 with cosine restart scheduling
- Warmup steps: 1000
- Evaluation metrics: WER (36.295%) and CER (11.073%)
- Training framework: Transformers 4.16.1 with PyTorch
Core Capabilities
- Automatic Speech Recognition specifically for Hausa language
- Supports both regular and language model-enhanced inference
- Handles audio resampling from 48kHz to 16kHz
- Efficient batch processing with CTC-based architecture
Frequently Asked Questions
Q: What makes this model unique?
This model specializes in Hausa language speech recognition, built on the powerful XLS-R architecture, making it one of the few high-performance ASR models for African languages. Its performance improves significantly with language model integration, reducing WER from 47.821% to 36.295%.
Q: What are the recommended use cases?
The model is ideal for Hausa speech transcription tasks, particularly in applications requiring automated transcription of Hausa audio content. It's suitable for both academic research and practical applications in speech processing for Hausa-speaking communities.