wav2vec2-large-xls-r-300m-ha-cv8

Property	Value
License	Apache 2.0
Training Dataset	Common Voice 8 (Hausa)
Best WER	36.295% (with LM)
Framework	PyTorch 1.10.0

What is wav2vec2-large-xls-r-300m-ha-cv8?

This model is a specialized speech recognition system fine-tuned for the Hausa language, based on Facebook's wav2vec2-xls-r-300m architecture. It represents a significant advancement in African language processing, achieving a Word Error Rate (WER) of 36.295% with language model integration.

Implementation Details

The model was trained using a sophisticated approach with 100 epochs, utilizing the Adam optimizer and a cosine learning rate scheduler with warmup steps. Training was conducted with a batch size of 32 and gradient accumulation steps of 2, demonstrating robust optimization strategies.

Learning rate: 0.0001 with cosine restart scheduling
Warmup steps: 1000
Evaluation metrics: WER (36.295%) and CER (11.073%)
Training framework: Transformers 4.16.1 with PyTorch

Core Capabilities

Automatic Speech Recognition specifically for Hausa language
Supports both regular and language model-enhanced inference
Handles audio resampling from 48kHz to 16kHz
Efficient batch processing with CTC-based architecture

Frequently Asked Questions

Q: What makes this model unique?

This model specializes in Hausa language speech recognition, built on the powerful XLS-R architecture, making it one of the few high-performance ASR models for African languages. Its performance improves significantly with language model integration, reducing WER from 47.821% to 36.295%.

Q: What are the recommended use cases?

The model is ideal for Hausa speech transcription tasks, particularly in applications requiring automated transcription of Hausa audio content. It's suitable for both academic research and practical applications in speech processing for Hausa-speaking communities.