PhoWhisper-large

vinai

Vietnamese-focused automatic speech recognition model fine-tuned on 844 hours of diverse Vietnamese accents, based on multilingual Whisper architecture

Property	Value
License	BSD-3-Clause
Language	Vietnamese
Downloads	72,556
Framework	PyTorch

What is PhoWhisper-large?

PhoWhisper-large is a state-of-the-art Automatic Speech Recognition (ASR) model specifically designed for the Vietnamese language. It's built by fine-tuning the multilingual Whisper model on an extensive dataset of 844 hours of Vietnamese speech, encompassing various regional accents and dialects.

Implementation Details

The model leverages the Transformer architecture and is implemented using PyTorch. It represents one of five versions developed by VINAI for Vietnamese speech recognition, demonstrating superior performance on benchmark Vietnamese ASR datasets.

Built on OpenAI's Whisper architecture
Fine-tuned on 844 hours of Vietnamese speech data
Optimized for multiple Vietnamese accents
Implements advanced transformer-based processing

Core Capabilities

High-accuracy Vietnamese speech recognition
Robust performance across different Vietnamese accents
Support for real-world applications through Inference Endpoints
State-of-the-art results on Vietnamese ASR benchmarks

Frequently Asked Questions

Q: What makes this model unique?

PhoWhisper-large stands out for its specialized focus on Vietnamese language processing, extensive training data incorporating diverse accents, and state-of-the-art performance on Vietnamese ASR benchmarks.

Q: What are the recommended use cases?

The model is ideal for Vietnamese speech-to-text applications, including transcription services, voice assistants, and automated subtitle generation for Vietnamese content.