DVoice Amharic ASR Model
Property | Value |
---|---|
Model Type | Speech Recognition (ASR) |
Architecture | wav2vec 2.0 + CTC/Attention |
Test WER | 24.92% |
Test CER | 6.57% |
Developer | AIOX Labs |
Framework | SpeechBrain |
What is dvoice-amharic?
DVoice-amharic is a state-of-the-art automatic speech recognition model specifically designed for the Amharic language, developed as part of the DVoice initiative to improve voice technology accessibility for African languages. The model utilizes Facebook's wav2vec 2.0 architecture combined with CTC/Attention mechanisms, trained on the ALFFA Amharic dataset.
Implementation Details
The model implements a two-block system comprising a unigram tokenizer for subword unit conversion and an acoustic model based on wav2vec 2.0. It's built on the SpeechBrain framework and leverages the pretrained wav2vec2-large-xlsr-53 model with additional DNN layers.
- Supports 16kHz audio input (automatically normalized)
- Integrates CTC greedy decoder for text generation
- Achieves 24.92% Word Error Rate on test sets
- Features automatic audio normalization and resampling
Core Capabilities
- End-to-end Amharic speech recognition
- Automatic audio preprocessing and normalization
- GPU inference support
- Easy integration through SpeechBrain API
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically designed for Amharic, a traditionally under-resourced language, and achieves impressive accuracy with a 24.92% WER. It's part of the larger DVoice initiative focusing on African language technologies.
Q: What are the recommended use cases?
The model is ideal for Amharic speech transcription tasks, particularly in applications requiring automatic conversion of Amharic speech to text. It's suitable for both research and practical applications in Amharic-speaking contexts.