wav2vec2-large-xlsr-kinyarwanda-apostrophied

lucio

Fine-tuned wav2vec2-large-xlsr-53 model for Kinyarwanda ASR, trained on 25% Common Voice data, specialized in apostrophe prediction with 39.92% WER.

Property	Value
Base Model	facebook/wav2vec2-large-xlsr-53
Training Data	Common Voice Kinyarwanda (25%)
WER	39.92%
Author	lucio
Model Hub	HuggingFace

What is wav2vec2-large-xlsr-kinyarwanda-apostrophied?

This is a specialized automatic speech recognition (ASR) model fine-tuned for the Kinyarwanda language, with a unique focus on predicting apostrophes that mark contractions of pronouns with vowel-initial words. Built upon the wav2vec2-large-xlsr-53 architecture, it was trained on a carefully curated subset of the Common Voice dataset, specifically selecting utterances without downvotes and shorter than 9.5 seconds.

Implementation Details

The model was trained on approximately 125,000 examples (25% of available data) using a V100 GPU provided by OVHcloud. The training process involved 20 epochs on an initial block of 32k examples, followed by 10 epochs each on three additional blocks of 32k examples, totaling about 60 hours of training time.

Input audio must be sampled at 16kHz
Trained on filtered dataset excluding downvoted utterances
Validates on 2048 utterances from validation set
Achieves 39.92% WER on test set

Core Capabilities

Specialized in Kinyarwanda speech recognition
Accurate apostrophe prediction for pronoun contractions
Handles continuous speech input
Produces text output with appropriate contractions

Frequently Asked Questions

Q: What makes this model unique?

This model specifically handles apostrophes in Kinyarwanda text, unlike its predecessor, making it particularly useful for applications requiring accurate transcription of contractions and pronouns. However, it may occasionally overgeneralize apostrophe usage.

Q: What are the recommended use cases?

The model is ideal for Kinyarwanda speech recognition tasks where accurate transcription of contractions is important. It's particularly suitable for applications requiring high-quality ASR with proper handling of pronoun contractions with vowel-initial words.