Belle-whisper-large-v3-zh-punct
Property | Value |
---|---|
License | Apache 2.0 |
Author | BELLE-2 |
Framework | PyTorch |
Task | Automatic Speech Recognition |
What is Belle-whisper-large-v3-zh-punct?
Belle-whisper-large-v3-zh-punct is an advanced Chinese ASR model that enhances the original Whisper large-v3 architecture with improved punctuation capabilities. This model maintains excellent performance on standard benchmarks while adding sophisticated punctuation mark handling derived from the punc_ct-transformer model.
Implementation Details
The model implements LoRA fine-tuning techniques on multiple Chinese speech datasets including AISHELL-1, AISHELL-2, WenetSpeech, and HKUST. It operates at a 16KHz sample rate and achieves impressive Character Error Rates (CER) across various benchmarks.
- Advanced punctuation handling while maintaining base ASR performance
- Optimized for complex acoustic environments
- Demonstrates improved performance in meeting scenarios (10.973% CER on WenetSpeech meeting dataset)
Core Capabilities
- State-of-the-art Chinese speech recognition with CER as low as 2.945% on AISHELL-1
- Enhanced punctuation mark processing
- Robust performance across various acoustic environments
- Seamless integration with the Transformers pipeline
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely combines high-quality Chinese ASR capabilities with enhanced punctuation handling, achieved through careful LoRA fine-tuning without compromising the base model's performance.
Q: What are the recommended use cases?
The model is particularly well-suited for Chinese speech transcription tasks requiring accurate punctuation, especially in challenging acoustic environments like meetings and complex audio scenarios.