wav2vec2-large-xlsr-catala
Property | Value |
---|---|
License | Apache 2.0 |
Downloads | 40,954 |
Primary Task | Automatic Speech Recognition |
Language | Catalan |
What is wav2vec2-large-xlsr-catala?
wav2vec2-large-xlsr-catala is a fine-tuned speech recognition model specifically optimized for the Catalan language. Based on Facebook's wav2vec2-large-xlsr-53 architecture, it has been trained on both the Common Voice and ParlamentParla datasets to provide accurate speech-to-text capabilities for Catalan speakers.
Implementation Details
The model represents a significant advancement in Catalan language processing, achieving impressive Word Error Rates (WER) across different test scenarios. It requires 16kHz audio input and leverages the powerful wav2vec2 architecture for optimal performance.
- Achieves 6.92% WER on combined test split
- 12.99% WER on Google Crowdsourced Corpus
- 13.23% WER on "La llegenda de Sant Jordi" audiobook
- Implements PyTorch framework
Core Capabilities
- Direct speech-to-text transcription without language model
- Optimized for 16kHz audio processing
- Robust performance across various Catalan speech contexts
- Suitable for both formal and informal speech recognition tasks
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized focus on Catalan language processing, combining multiple high-quality datasets and achieving state-of-the-art performance for Catalan speech recognition with a notably low WER of 6.92% on test data.
Q: What are the recommended use cases?
The model is ideal for Catalan speech transcription tasks, including parliamentary speech processing, audiobook transcription, and general-purpose speech recognition applications where Catalan language support is required. It's particularly effective for applications requiring 16kHz audio input.