PhoWhisper-large

PhoWhisper-large

vinai

Vietnamese-focused automatic speech recognition model fine-tuned on 844 hours of diverse Vietnamese accents, based on multilingual Whisper architecture

PropertyValue
LicenseBSD-3-Clause
LanguageVietnamese
Downloads72,556
FrameworkPyTorch

What is PhoWhisper-large?

PhoWhisper-large is a state-of-the-art Automatic Speech Recognition (ASR) model specifically designed for the Vietnamese language. It's built by fine-tuning the multilingual Whisper model on an extensive dataset of 844 hours of Vietnamese speech, encompassing various regional accents and dialects.

Implementation Details

The model leverages the Transformer architecture and is implemented using PyTorch. It represents one of five versions developed by VINAI for Vietnamese speech recognition, demonstrating superior performance on benchmark Vietnamese ASR datasets.

  • Built on OpenAI's Whisper architecture
  • Fine-tuned on 844 hours of Vietnamese speech data
  • Optimized for multiple Vietnamese accents
  • Implements advanced transformer-based processing

Core Capabilities

  • High-accuracy Vietnamese speech recognition
  • Robust performance across different Vietnamese accents
  • Support for real-world applications through Inference Endpoints
  • State-of-the-art results on Vietnamese ASR benchmarks

Frequently Asked Questions

Q: What makes this model unique?

PhoWhisper-large stands out for its specialized focus on Vietnamese language processing, extensive training data incorporating diverse accents, and state-of-the-art performance on Vietnamese ASR benchmarks.

Q: What are the recommended use cases?

The model is ideal for Vietnamese speech-to-text applications, including transcription services, voice assistants, and automated subtitle generation for Vietnamese content.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026