wav2vec2-xls-r-parlaspeech-hr

Property	Value
Base Model	facebook/wav2vec2-xls-r-300m
Training Data	300 hours of ParlaSpeech-HR v1.0
Test WER	0.0761
Paper	ParlaSpeech-HR Paper

What is wav2vec2-xls-r-parlaspeech-hr?

This is a Croatian Automatic Speech Recognition (ASR) model that builds upon the wav2vec2-xls-r-300m architecture. It has been specifically fine-tuned using 300 hours of Croatian parliamentary speech from the ParlaSpeech-HR v1.0 dataset, making it highly effective for Croatian speech recognition tasks.

Implementation Details

The model was trained with careful consideration of hyperparameters, including a batch size of 16, gradient accumulation steps of 4, and running for 8 epochs. The learning rate was set to 3e-4 with 500 warmup steps, optimizing the model's performance for Croatian speech recognition.

Achieves 0.0234 Character Error Rate (CER) on test set
Demonstrates 0.0761 Word Error Rate (WER) on test set
Implements the Wav2Vec2ForCTC architecture for speech processing

Core Capabilities

Croatian speech recognition with high accuracy
Optimal performance on parliamentary speech
Easy integration with the Transformers library
Support for various audio input formats

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Croatian language ASR, trained on high-quality parliamentary speech data, making it particularly effective for formal Croatian speech recognition tasks.

Q: What are the recommended use cases?

The model is ideal for transcribing Croatian speech, particularly in formal contexts like parliamentary proceedings, official speeches, and professional environments where high accuracy is required.