DVoice Kabyle ASR Model

Property	Value
License	Apache 2.0
Framework	SpeechBrain, PyTorch
Architecture	wav2vec 2.0 + CTC/Attention
Test Performance	WER: 24.80%, CER: 6.55%

What is dvoice-kabyle?

DVoice-kabyle is an automatic speech recognition (ASR) model specifically designed for the Kabyle language, developed by AIOX Labs. It leverages the powerful wav2vec 2.0 architecture combined with CTC/Attention mechanisms to provide accurate speech-to-text conversion for Kabyle speakers. The model was trained on the CommonVoice dataset and represents a significant step forward in low-resource language technology.

Implementation Details

The model implements a two-block architecture consisting of a unigram tokenizer for subword unit transformation and an acoustic model based on a fine-tuned wav2vec 2.0 (facebook/wav2vec2-large-xlsr-53). The system processes 16kHz single-channel audio and includes automatic audio normalization capabilities.

Pre-trained wav2vec 2.0 base model with additional DNN layers
CTC greedy decoder for final transcription
Automatic audio normalization and resampling
SpeechBrain framework integration

Core Capabilities

Accurate Kabyle speech recognition with 24.80% WER on test data
Real-time audio transcription support
GPU inference compatibility
Automatic audio preprocessing

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for the Kabyle language, which is typically considered a low-resource language. It's part of the larger DVoice initiative aimed at providing voice technologies for African languages, making it a crucial tool for Kabyle language technology development.

Q: What are the recommended use cases?

The model is ideal for Kabyle speech transcription tasks, including content creation, accessibility services, and language preservation efforts. It's particularly suitable for applications requiring 16kHz audio processing and can be deployed in both CPU and GPU environments.

dvoice-kabyle