Belle-whisper-large-v3-zh

Property	Value
License	Apache-2.0
Author	BELLE-2
Framework	PyTorch, Transformers
Task	Automatic Speech Recognition

What is Belle-whisper-large-v3-zh?

Belle-whisper-large-v3-zh is an advanced Chinese speech recognition model that builds upon OpenAI's Whisper large-v3 architecture. Through comprehensive fine-tuning on major Chinese speech datasets, it achieves remarkable improvements in Chinese ASR performance, showing 24-65% better results compared to the base model across various benchmarks.

Implementation Details

The model underwent full fine-tuning using multiple high-quality Chinese speech datasets, including AISHELL-1, AISHELL-2, WenetSpeech, and HKUST. It operates at a 16KHz sample rate and leverages the Transformers library for easy deployment.

Significant performance improvements on Chinese ASR benchmarks
Specialized for complex acoustic environments
Easy integration through Hugging Face Transformers pipeline
Supports transcription tasks with Chinese language optimization

Core Capabilities

Achieves 2.781% CER on AISHELL-1 test set
Demonstrates exceptional performance in meeting scenarios (11.246% CER on WenetSpeech meeting)
Handles diverse acoustic environments effectively
Seamless integration with standard ASR pipelines

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its significantly improved performance on Chinese ASR tasks, particularly in challenging acoustic environments like meeting recordings. It offers a substantial improvement over the base whisper-large-v3 model, with error rate reductions of up to 65% in some scenarios.

Q: What are the recommended use cases?

The model is ideal for Chinese speech recognition tasks, particularly in scenarios requiring high accuracy such as meeting transcription, general speech-to-text conversion, and applications requiring robust performance in various acoustic conditions. It's especially effective for complex audio environments where traditional ASR systems might struggle.