DVoice Amharic ASR Model

Property	Value
Model Type	Speech Recognition (ASR)
Architecture	wav2vec 2.0 + CTC/Attention
Test WER	24.92%
Test CER	6.57%
Developer	AIOX Labs
Framework	SpeechBrain

What is dvoice-amharic?

DVoice-amharic is a state-of-the-art automatic speech recognition model specifically designed for the Amharic language, developed as part of the DVoice initiative to improve voice technology accessibility for African languages. The model utilizes Facebook's wav2vec 2.0 architecture combined with CTC/Attention mechanisms, trained on the ALFFA Amharic dataset.

Implementation Details

The model implements a two-block system comprising a unigram tokenizer for subword unit conversion and an acoustic model based on wav2vec 2.0. It's built on the SpeechBrain framework and leverages the pretrained wav2vec2-large-xlsr-53 model with additional DNN layers.

Supports 16kHz audio input (automatically normalized)
Integrates CTC greedy decoder for text generation
Achieves 24.92% Word Error Rate on test sets
Features automatic audio normalization and resampling

Core Capabilities

End-to-end Amharic speech recognition
Automatic audio preprocessing and normalization
GPU inference support
Easy integration through SpeechBrain API

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for Amharic, a traditionally under-resourced language, and achieves impressive accuracy with a 24.92% WER. It's part of the larger DVoice initiative focusing on African language technologies.

Q: What are the recommended use cases?

The model is ideal for Amharic speech transcription tasks, particularly in applications requiring automatic conversion of Amharic speech to text. It's suitable for both research and practical applications in Amharic-speaking contexts.

dvoice-amharic