fonxlsr

fonxlsr

chrisjay

Fine-tuned Wav2Vec2-Large-XLSR-53 model for Fon language speech recognition, achieving 14.97% WER on test set. Trained on 8,235 samples.

PropertyValue
Model Basewav2vec2-large-xlsr-53
TaskSpeech Recognition
LanguageFon (Fongbe)
Training Data8,235 samples
Model AuthorChris C. Emezue & Bonaventure F.P. Dossou
Performance14.97% WER on test set

What is fonxlsr?

fonxlsr is a specialized speech recognition model fine-tuned from Facebook's wav2vec2-large-xlsr-53 architecture specifically for the Fon language. This model represents a significant advancement in African language processing technology, designed to convert Fon speech into text with high accuracy.

Implementation Details

The model operates on 16kHz audio input and uses the CTC (Connectionist Temporal Classification) architecture for speech recognition. It was trained on a carefully curated dataset split into 8,235 training samples, 1,107 validation samples, and 1,061 test samples.

  • Built on wav2vec2-large-xlsr-53 architecture
  • Requires 16kHz audio input sampling rate
  • Implements direct transcription without requiring a language model
  • Supports batch processing for efficient inference

Core Capabilities

  • Direct speech-to-text transcription for Fon language
  • Achieves 14.97% Word Error Rate on test set
  • Handles various speech patterns and accents within Fon
  • Efficient preprocessing and inference pipeline

Frequently Asked Questions

Q: What makes this model unique?

This model is one of the first speech recognition systems specifically designed for the Fon language, continuing the research work on OkwuGbé for African language speech recognition. It demonstrates strong performance with a 14.97% WER, making it practical for real-world applications.

Q: What are the recommended use cases?

The model is ideal for transcribing Fon speech in various applications, including: automated transcription services, language preservation efforts, educational tools, and research in African linguistics. All audio input should be sampled at 16kHz for optimal performance.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026