wav2vec2-large-xlsr-53-arabic

elgeish

A fine-tuned XLSR-53 model for Arabic speech recognition, achieving 26.55% WER on Common Voice test set. Supports 16kHz audio input.

Property	Value
License	Apache 2.0
Test WER	26.55%
Validation WER	23.39%
Dataset	Common Voice 6.1 + Arabic Speech Corpus

What is wav2vec2-large-xlsr-53-arabic?

This is a specialized speech recognition model fine-tuned on Arabic language data, based on Facebook's wav2vec2-large-xlsr-53 architecture. It's specifically designed to handle Arabic speech input and convert it to text, utilizing the Buckwalter transliteration format for Arabic text representation.

Implementation Details

The model was trained in two phases: first on the Arabic Speech Corpus, then further fine-tuned on Common Voice data. It requires 16kHz audio input and implements automatic speech recognition without requiring a language model.

Built on wav2vec2-large-xlsr-53 architecture
Uses Buckwalter transliteration for Arabic text representation
Supports multiple input sampling rates with automatic resampling
Trained on combined datasets for improved robustness

Core Capabilities

Direct speech-to-text transcription for Arabic
Handles various Arabic dialects
Automatic sampling rate conversion
Batch processing support

Frequently Asked Questions

Q: What makes this model unique?

This model combines training on both standard Arabic Speech Corpus and Common Voice datasets, making it robust for various Arabic dialects and accents. It uses the Buckwalter transliteration system, making it particularly useful for Arabic text processing systems.

Q: What are the recommended use cases?

The model is ideal for Arabic speech recognition tasks, particularly in applications requiring transcription of Modern Standard Arabic. It's well-suited for applications like voice commands, transcription services, and voice-enabled Arabic interfaces.