wav2vec2-large-xlsr-galician
Property | Value |
---|---|
Author | ifrz |
License | Apache 2.0 |
Test WER | 7.12% |
Base Model | facebook/wav2vec2-large-xlsr-53 |
Hugging Face | Model Repository |
What is wav2vec2-large-xlsr-galician?
wav2vec2-large-xlsr-galician is a specialized speech recognition model fine-tuned specifically for the Galician language. Built upon Facebook's wav2vec2-large-xlsr-53 self-supervised model, it has been optimized using refined datasets from OpenSLR and Mozilla Common Voice to provide accurate speech-to-text capabilities for Galician speakers.
Implementation Details
The model requires 16kHz mono audio input and utilizes the Wav2Vec2ForCTC architecture for speech recognition. It implements the CTC (Connectionist Temporal Classification) approach for sequence-to-sequence translation of audio to text.
- Built on wav2vec2-large-xlsr-53 architecture
- Fine-tuned with OpenSLR 77 and Mozilla Common Voice 8.0 datasets
- Achieves 7.12% Word Error Rate (WER) on test data
- Supports 16kHz sampling rate audio processing
Core Capabilities
- Automatic Speech Recognition (ASR) for Galician language
- Real-time audio transcription
- High accuracy with 7.12% WER
- Easy integration with Transformers library
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for Galician language speech recognition, utilizing both OpenSLR and Common Voice datasets for comprehensive language coverage and achieving impressive accuracy with a 7.12% WER.
Q: What are the recommended use cases?
The model is ideal for Galician speech recognition applications, including transcription services, voice assistants, and any application requiring Galician audio-to-text conversion. It's particularly suitable for applications requiring 16kHz mono audio processing.