ASR Wav2vec2 DVoice Wolof

Property	Value
License	Apache 2.0
Framework	SpeechBrain + PyTorch
Test WER	16.05%
Test CER	4.83%

What is asr-wav2vec2-dvoice-wolof?

This is an automatic speech recognition (ASR) model specifically designed for the Wolof language, built using the wav2vec 2.0 architecture and trained on the DVoice Wolof dataset. The model represents a significant advancement in African language technology, combining transfer learning from Facebook's wav2vec2-large-xlsr-53 with specialized training for Wolof speech recognition.

Implementation Details

The model employs a two-block architecture consisting of a unigram tokenizer for subword unit transformation and an acoustic model based on wav2vec2.0 with CTC decoding. It's built on the SpeechBrain framework and processes 16kHz single-channel audio input.

Pretrained wav2vec 2.0 base (facebook/wav2vec2-large-xlsr-53)
CTC decoding with greedy search
Automatic audio normalization capabilities
Support for GPU inference

Core Capabilities

Achieves 16.05% Word Error Rate on test set
Character Error Rate of 4.83% on test data
Handles 16kHz audio input with automatic resampling
Supports real-time transcription of Wolof speech

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for Wolof, a low-resource African language, and is part of the DVoice initiative to improve voice technology accessibility for African languages. It achieves competitive performance metrics while requiring minimal preprocessing of input audio.

Q: What are the recommended use cases?

The model is ideal for Wolof speech transcription tasks, particularly in applications requiring automatic subtitling, voice command systems, or speech-to-text services for Wolof speakers. It's especially valuable for organizations working with Wolof-speaking communities or developing language technology solutions for West Africa.