wav2vec2-xls-r-parlaspeech-hr

Maintained By
classla

wav2vec2-xls-r-parlaspeech-hr

PropertyValue
Base Modelfacebook/wav2vec2-xls-r-300m
Training Data300 hours of ParlaSpeech-HR v1.0
Test WER0.0761
PaperParlaSpeech-HR Paper

What is wav2vec2-xls-r-parlaspeech-hr?

This is a Croatian Automatic Speech Recognition (ASR) model that builds upon the wav2vec2-xls-r-300m architecture. It has been specifically fine-tuned using 300 hours of Croatian parliamentary speech from the ParlaSpeech-HR v1.0 dataset, making it highly effective for Croatian speech recognition tasks.

Implementation Details

The model was trained with careful consideration of hyperparameters, including a batch size of 16, gradient accumulation steps of 4, and running for 8 epochs. The learning rate was set to 3e-4 with 500 warmup steps, optimizing the model's performance for Croatian speech recognition.

  • Achieves 0.0234 Character Error Rate (CER) on test set
  • Demonstrates 0.0761 Word Error Rate (WER) on test set
  • Implements the Wav2Vec2ForCTC architecture for speech processing

Core Capabilities

  • Croatian speech recognition with high accuracy
  • Optimal performance on parliamentary speech
  • Easy integration with the Transformers library
  • Support for various audio input formats

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Croatian language ASR, trained on high-quality parliamentary speech data, making it particularly effective for formal Croatian speech recognition tasks.

Q: What are the recommended use cases?

The model is ideal for transcribing Croatian speech, particularly in formal contexts like parliamentary proceedings, official speeches, and professional environments where high accuracy is required.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.