wav2vec2-large-xlsr-kinyarwanda-apostrophied
Property | Value |
---|---|
Base Model | facebook/wav2vec2-large-xlsr-53 |
Training Data | Common Voice Kinyarwanda (25%) |
WER | 39.92% |
Author | lucio |
Model Hub | HuggingFace |
What is wav2vec2-large-xlsr-kinyarwanda-apostrophied?
This is a specialized automatic speech recognition (ASR) model fine-tuned for the Kinyarwanda language, with a unique focus on predicting apostrophes that mark contractions of pronouns with vowel-initial words. Built upon the wav2vec2-large-xlsr-53 architecture, it was trained on a carefully curated subset of the Common Voice dataset, specifically selecting utterances without downvotes and shorter than 9.5 seconds.
Implementation Details
The model was trained on approximately 125,000 examples (25% of available data) using a V100 GPU provided by OVHcloud. The training process involved 20 epochs on an initial block of 32k examples, followed by 10 epochs each on three additional blocks of 32k examples, totaling about 60 hours of training time.
- Input audio must be sampled at 16kHz
- Trained on filtered dataset excluding downvoted utterances
- Validates on 2048 utterances from validation set
- Achieves 39.92% WER on test set
Core Capabilities
- Specialized in Kinyarwanda speech recognition
- Accurate apostrophe prediction for pronoun contractions
- Handles continuous speech input
- Produces text output with appropriate contractions
Frequently Asked Questions
Q: What makes this model unique?
This model specifically handles apostrophes in Kinyarwanda text, unlike its predecessor, making it particularly useful for applications requiring accurate transcription of contractions and pronouns. However, it may occasionally overgeneralize apostrophe usage.
Q: What are the recommended use cases?
The model is ideal for Kinyarwanda speech recognition tasks where accurate transcription of contractions is important. It's particularly suitable for applications requiring high-quality ASR with proper handling of pronoun contractions with vowel-initial words.