wav2vec2-base-vi

Property	Value
Parameter Count	95M
License	CC-BY-NC-4.0
Training Data	13k hours YouTube Vietnamese audio
Architecture	Wav2Vec2 Base

What is wav2vec2-base-vi?

wav2vec2-base-vi is a self-supervised learning model designed specifically for Vietnamese speech recognition. Developed by nguyenvulebinh, it's trained on a diverse dataset of 13,000 hours of Vietnamese YouTube audio, including clean audio, noise audio, conversation, and multiple genders and dialects. The model employs the wav2vec2 architecture, which has proven highly effective for speech processing tasks.

Implementation Details

The model was trained for 35 epochs using TPU V3-8, implementing the wav2vec2 architecture that follows the same structure as its English counterpart. It's designed to be easily integrated using the Transformers library and can be fine-tuned for specific speech recognition tasks.

Transformer-based architecture optimized for Vietnamese speech
Trained on diverse audio sources ensuring robust performance
Compatible with Hugging Face's Transformers library
Supports both base (95M params) and large (317M params) versions

Core Capabilities

Self-supervised learning for speech recognition
Achieves 8.66% WER without LM and 6.53% with 5-grams LM on VLSP 2020 dataset
Supports both inference with and without language model integration
Handles various Vietnamese dialects and audio conditions

Frequently Asked Questions

Q: What makes this model unique?

The model is specifically trained on a massive Vietnamese audio dataset, making it one of the largest Vietnamese speech models available. Its architecture is optimized for Vietnamese language characteristics while maintaining compatibility with standard wav2vec2 implementations.

Q: What are the recommended use cases?

The model is ideal for Vietnamese speech recognition tasks, particularly in applications requiring transcription of YouTube content, conversational audio, or mixed-condition speech. It can be used both with and without a language model, depending on accuracy requirements.

wav2vec2-base-vi

wav2vec2-base-vi

What is wav2vec2-base-vi?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models