fonxlsr: Wav2Vec2 for Fon Language Recognition

Property	Value
Model Base	wav2vec2-large-xlsr-53
Task	Speech Recognition
Language	Fon (Fongbe)
Training Data	8,235 samples
Model Author	Chris C. Emezue & Bonaventure F.P. Dossou
Performance	14.97% WER on test set

What is fonxlsr?

fonxlsr is a specialized speech recognition model fine-tuned from Facebook's wav2vec2-large-xlsr-53 architecture specifically for the Fon language. This model represents a significant advancement in African language processing technology, designed to convert Fon speech into text with high accuracy.

Implementation Details

The model operates on 16kHz audio input and uses the CTC (Connectionist Temporal Classification) architecture for speech recognition. It was trained on a carefully curated dataset split into 8,235 training samples, 1,107 validation samples, and 1,061 test samples.

Built on wav2vec2-large-xlsr-53 architecture
Requires 16kHz audio input sampling rate
Implements direct transcription without requiring a language model
Supports batch processing for efficient inference

Core Capabilities

Direct speech-to-text transcription for Fon language
Achieves 14.97% Word Error Rate on test set
Handles various speech patterns and accents within Fon
Efficient preprocessing and inference pipeline

Frequently Asked Questions

Q: What makes this model unique?

This model is one of the first speech recognition systems specifically designed for the Fon language, continuing the research work on OkwuGbé for African language speech recognition. It demonstrates strong performance with a 14.97% WER, making it practical for real-world applications.

Q: What are the recommended use cases?

The model is ideal for transcribing Fon speech in various applications, including: automated transcription services, language preservation efforts, educational tools, and research in African linguistics. All audio input should be sampled at 16kHz for optimal performance.

fonxlsr