wav2vec2-xls-r-1b-portuguese

jonatasgrosman

XLS-R 1B-based speech recognition model for Portuguese, fine-tuned on Common Voice 8.0. Achieves 8.7% WER, drops to 6.04% with LM. 360K+ downloads.

Property	Value
License	Apache 2.0
Author	jonatasgrosman
Downloads	359,756
Base Architecture	XLS-R Wav2Vec2

What is wav2vec2-xls-r-1b-portuguese?

This is a state-of-the-art speech recognition model specifically fine-tuned for Portuguese language processing. Built on Facebook's wav2vec2-xls-r-1b architecture, it has been optimized using multiple high-quality datasets including Common Voice 8.0, CORAA, Multilingual TEDx, and Multilingual LibriSpeech. The model demonstrates impressive performance with a Word Error Rate (WER) of 8.7%, which improves to 6.04% when combined with a Language Model.

Implementation Details

The model operates on 16kHz audio input and leverages the powerful XLS-R architecture for acoustic modeling. It has been trained using the HuggingSound tool and is optimized for Portuguese speech recognition tasks.

Supports both standard inference and language model enhanced transcription
Achieved 2.55% Character Error Rate (CER) on test data
Performs well on challenging scenarios with 18.8% WER on Robust Speech Event test data

Core Capabilities

High-accuracy Portuguese speech recognition
Batch processing of audio files
Support for various audio formats
Easy integration with both HuggingSound and custom inference scripts

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its comprehensive training on diverse Portuguese speech datasets and impressive error rates, making it particularly robust for real-world applications. The inclusion of language model enhancement options provides flexibility for different use cases.

Q: What are the recommended use cases?

The model is ideal for Portuguese speech transcription tasks, particularly in scenarios requiring high accuracy. It's suitable for applications like automated transcription services, subtitle generation, and voice command systems for Portuguese speakers.