wav2vec2-large-xlsr-galician

Property	Value
Author	ifrz
License	Apache 2.0
Test WER	7.12%
Base Model	facebook/wav2vec2-large-xlsr-53
Hugging Face	Model Repository

What is wav2vec2-large-xlsr-galician?

wav2vec2-large-xlsr-galician is a specialized speech recognition model fine-tuned specifically for the Galician language. Built upon Facebook's wav2vec2-large-xlsr-53 self-supervised model, it has been optimized using refined datasets from OpenSLR and Mozilla Common Voice to provide accurate speech-to-text capabilities for Galician speakers.

Implementation Details

The model requires 16kHz mono audio input and utilizes the Wav2Vec2ForCTC architecture for speech recognition. It implements the CTC (Connectionist Temporal Classification) approach for sequence-to-sequence translation of audio to text.

Built on wav2vec2-large-xlsr-53 architecture
Fine-tuned with OpenSLR 77 and Mozilla Common Voice 8.0 datasets
Achieves 7.12% Word Error Rate (WER) on test data
Supports 16kHz sampling rate audio processing

Core Capabilities

Automatic Speech Recognition (ASR) for Galician language
Real-time audio transcription
High accuracy with 7.12% WER
Easy integration with Transformers library

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Galician language speech recognition, utilizing both OpenSLR and Common Voice datasets for comprehensive language coverage and achieving impressive accuracy with a 7.12% WER.

Q: What are the recommended use cases?

The model is ideal for Galician speech recognition applications, including transcription services, voice assistants, and any application requiring Galician audio-to-text conversion. It's particularly suitable for applications requiring 16kHz mono audio processing.