asr-wav2vec2-dvoice-darija

speechbrain

Specialized ASR model for Darija (Moroccan Arabic) using wav2vec 2.0, achieving 18.28% WER on test data. Features CTC/Attention architecture and unigram tokenization.

Property	Value
Model Type	Speech Recognition (ASR)
Architecture	wav2vec 2.0 + CTC/Attention
Performance	18.28% WER (Test), 5.85% CER (Test)
Source	HuggingFace

What is asr-wav2vec2-dvoice-darija?

This is a specialized automatic speech recognition model designed specifically for Darija (Moroccan Arabic dialect), developed as part of the DVoice initiative. It combines Facebook's wav2vec 2.0 architecture with CTC/Attention mechanisms, trained on the DVoice Darija dataset. The model represents a significant advancement in ASR technology for low-resource African languages.

Implementation Details

The model architecture consists of two main components: a unigram tokenizer for subword unit transformation and an acoustic model based on wav2vec 2.0. It utilizes the facebook/wav2vec2-large-xlsr-53 pretrained model as its foundation, enhanced with two additional DNN layers fine-tuned on Darija speech data.

Supports 16kHz audio input (single channel)
Automatic audio normalization capabilities
Implements CTC greedy decoder for inference
Built using the SpeechBrain framework

Core Capabilities

Direct transcription of Darija speech to text
Achieves 18.28% Word Error Rate on test data
Supports GPU inference for faster processing
Handles automatic audio preprocessing

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for Darija, a traditionally under-resourced language. It's part of the DVoice initiative, which aims to improve voice technology access for African languages. The combination of wav2vec 2.0 with CTC/Attention mechanisms makes it particularly effective for Darija speech recognition.

Q: What are the recommended use cases?

The model is ideal for transcribing Darija speech in various applications, including voice assistants, transcription services, and speech-to-text applications. It's particularly suitable for applications requiring Moroccan Arabic dialect understanding.