wav2vec2-large-xlsr-galician

Maintained By
ifrz

wav2vec2-large-xlsr-galician

PropertyValue
Authorifrz
LicenseApache 2.0
Test WER7.12%
Base Modelfacebook/wav2vec2-large-xlsr-53
Hugging FaceModel Repository

What is wav2vec2-large-xlsr-galician?

wav2vec2-large-xlsr-galician is a specialized speech recognition model fine-tuned specifically for the Galician language. Built upon Facebook's wav2vec2-large-xlsr-53 self-supervised model, it has been optimized using refined datasets from OpenSLR and Mozilla Common Voice to provide accurate speech-to-text capabilities for Galician speakers.

Implementation Details

The model requires 16kHz mono audio input and utilizes the Wav2Vec2ForCTC architecture for speech recognition. It implements the CTC (Connectionist Temporal Classification) approach for sequence-to-sequence translation of audio to text.

  • Built on wav2vec2-large-xlsr-53 architecture
  • Fine-tuned with OpenSLR 77 and Mozilla Common Voice 8.0 datasets
  • Achieves 7.12% Word Error Rate (WER) on test data
  • Supports 16kHz sampling rate audio processing

Core Capabilities

  • Automatic Speech Recognition (ASR) for Galician language
  • Real-time audio transcription
  • High accuracy with 7.12% WER
  • Easy integration with Transformers library

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Galician language speech recognition, utilizing both OpenSLR and Common Voice datasets for comprehensive language coverage and achieving impressive accuracy with a 7.12% WER.

Q: What are the recommended use cases?

The model is ideal for Galician speech recognition applications, including transcription services, voice assistants, and any application requiring Galician audio-to-text conversion. It's particularly suitable for applications requiring 16kHz mono audio processing.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.