wav2vec2-large-xlsr-catala

softcatala

A Catalan speech recognition model based on wav2vec2-large-xlsr-53, achieving 6.92% WER on test data. Optimized for 16kHz audio input.

Property	Value
License	Apache 2.0
Downloads	40,954
Primary Task	Automatic Speech Recognition
Language	Catalan

What is wav2vec2-large-xlsr-catala?

wav2vec2-large-xlsr-catala is a fine-tuned speech recognition model specifically optimized for the Catalan language. Based on Facebook's wav2vec2-large-xlsr-53 architecture, it has been trained on both the Common Voice and ParlamentParla datasets to provide accurate speech-to-text capabilities for Catalan speakers.

Implementation Details

The model represents a significant advancement in Catalan language processing, achieving impressive Word Error Rates (WER) across different test scenarios. It requires 16kHz audio input and leverages the powerful wav2vec2 architecture for optimal performance.

Achieves 6.92% WER on combined test split
12.99% WER on Google Crowdsourced Corpus
13.23% WER on "La llegenda de Sant Jordi" audiobook
Implements PyTorch framework

Core Capabilities

Direct speech-to-text transcription without language model
Optimized for 16kHz audio processing
Robust performance across various Catalan speech contexts
Suitable for both formal and informal speech recognition tasks

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized focus on Catalan language processing, combining multiple high-quality datasets and achieving state-of-the-art performance for Catalan speech recognition with a notably low WER of 6.92% on test data.

Q: What are the recommended use cases?

The model is ideal for Catalan speech transcription tasks, including parliamentary speech processing, audiobook transcription, and general-purpose speech recognition applications where Catalan language support is required. It's particularly effective for applications requiring 16kHz audio input.