PhoWhisper-large

Maintained By
vinai

PhoWhisper-large

PropertyValue
LicenseBSD-3-Clause
LanguageVietnamese
Downloads72,556
FrameworkPyTorch

What is PhoWhisper-large?

PhoWhisper-large is a state-of-the-art Automatic Speech Recognition (ASR) model specifically designed for the Vietnamese language. It's built by fine-tuning the multilingual Whisper model on an extensive dataset of 844 hours of Vietnamese speech, encompassing various regional accents and dialects.

Implementation Details

The model leverages the Transformer architecture and is implemented using PyTorch. It represents one of five versions developed by VINAI for Vietnamese speech recognition, demonstrating superior performance on benchmark Vietnamese ASR datasets.

  • Built on OpenAI's Whisper architecture
  • Fine-tuned on 844 hours of Vietnamese speech data
  • Optimized for multiple Vietnamese accents
  • Implements advanced transformer-based processing

Core Capabilities

  • High-accuracy Vietnamese speech recognition
  • Robust performance across different Vietnamese accents
  • Support for real-world applications through Inference Endpoints
  • State-of-the-art results on Vietnamese ASR benchmarks

Frequently Asked Questions

Q: What makes this model unique?

PhoWhisper-large stands out for its specialized focus on Vietnamese language processing, extensive training data incorporating diverse accents, and state-of-the-art performance on Vietnamese ASR benchmarks.

Q: What are the recommended use cases?

The model is ideal for Vietnamese speech-to-text applications, including transcription services, voice assistants, and automated subtitle generation for Vietnamese content.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.