wav2vec2-base-vietnamese-250h

Maintained By
nguyenvulebinh

wav2vec2-base-vietnamese-250h

PropertyValue
Parameters95M
LicenseCC BY-NC 4.0
Authornguyenvulebinh
Training Data13k hours pre-training, 250 hours fine-tuning

What is wav2vec2-base-vietnamese-250h?

wav2vec2-base-vietnamese-250h is a state-of-the-art Vietnamese speech recognition model based on Facebook's wav2vec 2.0 architecture. The model was pre-trained on 13,000 hours of unlabeled Vietnamese YouTube audio and fine-tuned on 250 hours of labeled VLSP ASR dataset data. It achieves impressive Word Error Rates (WER) of 6.15% on the VIVOS dataset and 11.52% on Common Voice VI when combined with a 4-gram language model.

Implementation Details

The model utilizes the wav2vec 2.0 architecture and Connectionist Temporal Classification (CTC) for fine-tuning. It processes 16kHz sampled speech audio and functions as an acoustic model that can be enhanced with an optional 4-gram language model for improved accuracy.

  • Pre-trained on 13k hours of unlabeled Vietnamese audio
  • Fine-tuned on 250 hours of labeled VLSP ASR data
  • Supports audio input sampled at 16kHz
  • Optimized for audio segments under 10 seconds

Core Capabilities

  • Achieves 6.15% WER on VIVOS dataset with language model
  • 11.52% WER on Common Voice VI
  • Supports end-to-end speech recognition
  • Can be used with or without the 4-gram language model

Frequently Asked Questions

Q: What makes this model unique?

This model is the first Vietnamese speech recognition system that achieves state-of-the-art results using wav2vec 2.0's self-supervised learning approach, demonstrating that learning from raw audio alone can outperform traditional semi-supervised methods.

Q: What are the recommended use cases?

The model is ideal for Vietnamese speech recognition tasks, particularly for audio segments under 10 seconds. It's suitable for applications requiring high-accuracy transcription, though it's important to note the non-commercial license restrictions.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.