wav2vec2-large-xlsr-open-brazilian-portuguese-v2

Maintained By
lgris

wav2vec2-large-xlsr-open-brazilian-portuguese-v2

PropertyValue
LicenseApache 2.0
PaperResearch Paper
Authorlgris
Test WER10.69%

What is wav2vec2-large-xlsr-open-brazilian-portuguese-v2?

This is a specialized speech recognition model fine-tuned for Brazilian Portuguese using the Wav2vec 2.0 architecture. The model was trained on an extensive combination of five major Portuguese speech datasets: CETUC (145 hours), Multilingual LibriSpeech (284 hours), VoxForge, Common Voice 6.1, and Lapsbm, creating one of the most comprehensive Brazilian Portuguese ASR models available.

Implementation Details

The model is built upon the wav2vec 2.0 architecture and has been fine-tuned using fairseq before being converted for easier deployment. It achieves a Word Error Rate (WER) of 10.69% on the Common Voice test set and 34.53% on out-of-domain TEDx data, demonstrating robust performance for in-domain applications.

  • Trained on multiple high-quality datasets totaling over 400 hours of speech
  • Supports 16kHz audio input
  • Implements CTC-based speech recognition
  • Optimized for Brazilian Portuguese variants

Core Capabilities

  • Automatic Speech Recognition for Brazilian Portuguese
  • Handles varied speaking styles and accents
  • Supports both clean and real-world audio conditions
  • Processes audio at 16kHz sampling rate

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its comprehensive training on five different Brazilian Portuguese datasets, making it particularly robust for real-world applications. The combination of academic (CETUC), crowd-sourced (Common Voice), and audiobook (MLS) data provides excellent coverage of different speaking styles and contexts.

Q: What are the recommended use cases?

The model is best suited for Brazilian Portuguese speech recognition tasks in controlled environments, particularly for applications requiring transcription of clear speech. It performs especially well on in-domain data similar to its training sets, making it ideal for applications like voice commands, transcription services, and automated subtitling for Brazilian Portuguese content.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.