hubert-large-arabic-egyptian

Property	Value
Parameter Count	315M
License	cc-by-nc-4.0
Paper	View Paper
WER (Test)	25.9%

What is hubert-large-arabic-egyptian?

This is a state-of-the-art automatic speech recognition (ASR) model specifically designed for Egyptian Arabic. It's based on the HuBERT architecture and has been fine-tuned on both the MGB-3 dataset and Egyptian Arabic Conversational Speech Corpus. The model represents a significant advancement in Egyptian Arabic speech recognition, achieving a Word Error Rate (WER) of 25.9% on test data.

Implementation Details

The model builds upon the original Arabic HuBERT-Large architecture, utilizing a combination of CTC and Attention mechanisms. It's optimized for 16kHz sampled audio input and employs transformer-based architecture with PyTorch backend.

315M trainable parameters
Uses F32 tensor type
Implements CTC and Attention mechanisms
Trained on MGB-3 and Egyptian Arabic Conversational Speech datasets

Core Capabilities

State-of-the-art Egyptian Arabic speech recognition
Handles conversational speech effectively
23.5% WER on validation set
25.9% WER on test set

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Egyptian Arabic, achieving state-of-the-art performance through its fine-tuning on both MGB-3 and Egyptian Arabic Conversational Speech Corpus. It's particularly notable for handling the nuances of Egyptian dialect.

Q: What are the recommended use cases?

The model is ideal for transcribing Egyptian Arabic speech in various applications, including conversational AI, content transcription, and speech analytics. It works best with 16kHz audio input and is particularly suited for real-world Egyptian Arabic speech recognition tasks.