dvoice-amharic

Maintained By
aioxlabs

DVoice Amharic ASR Model

PropertyValue
Model TypeSpeech Recognition (ASR)
Architecturewav2vec 2.0 + CTC/Attention
Test WER24.92%
Test CER6.57%
DeveloperAIOX Labs
FrameworkSpeechBrain

What is dvoice-amharic?

DVoice-amharic is a state-of-the-art automatic speech recognition model specifically designed for the Amharic language, developed as part of the DVoice initiative to improve voice technology accessibility for African languages. The model utilizes Facebook's wav2vec 2.0 architecture combined with CTC/Attention mechanisms, trained on the ALFFA Amharic dataset.

Implementation Details

The model implements a two-block system comprising a unigram tokenizer for subword unit conversion and an acoustic model based on wav2vec 2.0. It's built on the SpeechBrain framework and leverages the pretrained wav2vec2-large-xlsr-53 model with additional DNN layers.

  • Supports 16kHz audio input (automatically normalized)
  • Integrates CTC greedy decoder for text generation
  • Achieves 24.92% Word Error Rate on test sets
  • Features automatic audio normalization and resampling

Core Capabilities

  • End-to-end Amharic speech recognition
  • Automatic audio preprocessing and normalization
  • GPU inference support
  • Easy integration through SpeechBrain API

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for Amharic, a traditionally under-resourced language, and achieves impressive accuracy with a 24.92% WER. It's part of the larger DVoice initiative focusing on African language technologies.

Q: What are the recommended use cases?

The model is ideal for Amharic speech transcription tasks, particularly in applications requiring automatic conversion of Amharic speech to text. It's suitable for both research and practical applications in Amharic-speaking contexts.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.