wav2vec2-xls-r-1b-portuguese
Property | Value |
---|---|
License | Apache 2.0 |
Author | jonatasgrosman |
Downloads | 359,756 |
Base Architecture | XLS-R Wav2Vec2 |
What is wav2vec2-xls-r-1b-portuguese?
This is a state-of-the-art speech recognition model specifically fine-tuned for Portuguese language processing. Built on Facebook's wav2vec2-xls-r-1b architecture, it has been optimized using multiple high-quality datasets including Common Voice 8.0, CORAA, Multilingual TEDx, and Multilingual LibriSpeech. The model demonstrates impressive performance with a Word Error Rate (WER) of 8.7%, which improves to 6.04% when combined with a Language Model.
Implementation Details
The model operates on 16kHz audio input and leverages the powerful XLS-R architecture for acoustic modeling. It has been trained using the HuggingSound tool and is optimized for Portuguese speech recognition tasks.
- Supports both standard inference and language model enhanced transcription
- Achieved 2.55% Character Error Rate (CER) on test data
- Performs well on challenging scenarios with 18.8% WER on Robust Speech Event test data
Core Capabilities
- High-accuracy Portuguese speech recognition
- Batch processing of audio files
- Support for various audio formats
- Easy integration with both HuggingSound and custom inference scripts
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its comprehensive training on diverse Portuguese speech datasets and impressive error rates, making it particularly robust for real-world applications. The inclusion of language model enhancement options provides flexibility for different use cases.
Q: What are the recommended use cases?
The model is ideal for Portuguese speech transcription tasks, particularly in scenarios requiring high accuracy. It's suitable for applications like automated transcription services, subtitle generation, and voice command systems for Portuguese speakers.