ASR Wav2vec2 DVoice Wolof
Property | Value |
---|---|
License | Apache 2.0 |
Framework | SpeechBrain + PyTorch |
Test WER | 16.05% |
Test CER | 4.83% |
What is asr-wav2vec2-dvoice-wolof?
This is an automatic speech recognition (ASR) model specifically designed for the Wolof language, built using the wav2vec 2.0 architecture and trained on the DVoice Wolof dataset. The model represents a significant advancement in African language technology, combining transfer learning from Facebook's wav2vec2-large-xlsr-53 with specialized training for Wolof speech recognition.
Implementation Details
The model employs a two-block architecture consisting of a unigram tokenizer for subword unit transformation and an acoustic model based on wav2vec2.0 with CTC decoding. It's built on the SpeechBrain framework and processes 16kHz single-channel audio input.
- Pretrained wav2vec 2.0 base (facebook/wav2vec2-large-xlsr-53)
- CTC decoding with greedy search
- Automatic audio normalization capabilities
- Support for GPU inference
Core Capabilities
- Achieves 16.05% Word Error Rate on test set
- Character Error Rate of 4.83% on test data
- Handles 16kHz audio input with automatic resampling
- Supports real-time transcription of Wolof speech
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically designed for Wolof, a low-resource African language, and is part of the DVoice initiative to improve voice technology accessibility for African languages. It achieves competitive performance metrics while requiring minimal preprocessing of input audio.
Q: What are the recommended use cases?
The model is ideal for Wolof speech transcription tasks, particularly in applications requiring automatic subtitling, voice command systems, or speech-to-text services for Wolof speakers. It's especially valuable for organizations working with Wolof-speaking communities or developing language technology solutions for West Africa.