wav2vec2-xls-r-300m-wolof-lm

Property	Value
Base Model	facebook/wav2vec2-xls-r-300m
Training Data	16.8 hours (10,000 audio files)
Best WER	21.26%
Author	abdouaziiz
Model Hub	Hugging Face

What is wav2vec2-xls-r-300m-wolof-lm?

This is a specialized speech recognition model designed for the Wolof language, primarily spoken in Senegal and neighboring countries. It represents a significant advancement in low-resource language processing, built by fine-tuning the powerful XLS-R 300M model with a custom language model trained on the ALFFA_PUBLIC dataset.

Implementation Details

The model was trained using carefully selected hyperparameters including a learning rate of 1e-4, Adam optimizer, and linear learning rate scheduling. Training was conducted over 10 epochs with regular evaluation every 1500 steps, showing consistent improvement in Word Error Rate (WER) from 54.39% to 21.26%.

Training batch size: 3 with total batch size of 64
Evaluation batch size: 8 with total batch size of 64
Warmup steps: 1000
Training dataset: 10,000 audio files
Test dataset: 3,339 audio files

Core Capabilities

Automatic Speech Recognition (ASR) for Wolof language
Integration with Hugging Face's Transformers library
Customizable preprocessing and inference pipeline
Support for 16kHz audio input
Language model enhancement for improved accuracy

Frequently Asked Questions

Q: What makes this model unique?

This model addresses the significant gap in speech recognition technology for the Wolof language, providing a practical solution for a traditionally under-resourced language. Its achievement of 21.26% WER makes it a valuable tool for Wolof speech processing tasks.

Q: What are the recommended use cases?

The model is ideal for transcribing Wolof speech in various applications including: speech-to-text services, language documentation efforts, accessibility tools, and academic research in West African languages. The model can be further improved through spell checking and additional language model integration.