filipino-wav2vec2-l-xls-r-300m-official

filipino-wav2vec2-l-xls-r-300m-official

Khalsuu

A Filipino speech recognition model fine-tuned on facebook/wav2vec2-xls-r-300m, achieving 29.22% WER with linear learning rate scheduling and 30 epochs of training.

PropertyValue
Base Modelfacebook/wav2vec2-xls-r-300m
TaskFilipino Speech Recognition
Performance29.22% WER
AuthorKhalsuu
Model LinkHugging Face

What is filipino-wav2vec2-l-xls-r-300m-official?

This is a specialized speech recognition model fine-tuned for the Filipino language, based on Facebook's wav2vec2-xls-r-300m architecture. The model demonstrates strong performance with a Word Error Rate (WER) of 29.22% on the evaluation set, making it suitable for Filipino speech-to-text applications.

Implementation Details

The model was trained using a carefully optimized training procedure with the following key specifications: Adam optimizer with β=(0.9,0.999), linear learning rate scheduling with warmup steps, and mixed precision training using Native AMP. The training process spanned 30 epochs with a learning rate of 0.0003 and a total batch size of 16.

  • Gradient accumulation steps: 2
  • Learning rate warmup steps: 500
  • Training batch size: 8
  • Evaluation batch size: 8
  • Seed: 42

Core Capabilities

  • Filipino speech recognition with 29.22% WER
  • Efficient processing with mixed precision training
  • Optimized for production deployment
  • Based on the robust wav2vec2-xls-r-300m architecture

Frequently Asked Questions

Q: What makes this model unique?

This model specializes in Filipino speech recognition, leveraging the powerful wav2vec2-xls-r-300m architecture while achieving a competitive WER of 29.22%. The training process shows consistent improvement, with the error rate decreasing from 59.87% to 29.22% over the training period.

Q: What are the recommended use cases?

The model is particularly suited for Filipino speech-to-text applications, including transcription services, voice assistants, and automated subtitling systems. Its relatively low WER makes it suitable for production environments where accurate Filipino speech recognition is required.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026