Belle-whisper-large-v3-zh

Maintained By
BELLE-2

Belle-whisper-large-v3-zh

PropertyValue
LicenseApache-2.0
AuthorBELLE-2
FrameworkPyTorch, Transformers
TaskAutomatic Speech Recognition

What is Belle-whisper-large-v3-zh?

Belle-whisper-large-v3-zh is an advanced Chinese speech recognition model that builds upon OpenAI's Whisper large-v3 architecture. Through comprehensive fine-tuning on major Chinese speech datasets, it achieves remarkable improvements in Chinese ASR performance, showing 24-65% better results compared to the base model across various benchmarks.

Implementation Details

The model underwent full fine-tuning using multiple high-quality Chinese speech datasets, including AISHELL-1, AISHELL-2, WenetSpeech, and HKUST. It operates at a 16KHz sample rate and leverages the Transformers library for easy deployment.

  • Significant performance improvements on Chinese ASR benchmarks
  • Specialized for complex acoustic environments
  • Easy integration through Hugging Face Transformers pipeline
  • Supports transcription tasks with Chinese language optimization

Core Capabilities

  • Achieves 2.781% CER on AISHELL-1 test set
  • Demonstrates exceptional performance in meeting scenarios (11.246% CER on WenetSpeech meeting)
  • Handles diverse acoustic environments effectively
  • Seamless integration with standard ASR pipelines

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its significantly improved performance on Chinese ASR tasks, particularly in challenging acoustic environments like meeting recordings. It offers a substantial improvement over the base whisper-large-v3 model, with error rate reductions of up to 65% in some scenarios.

Q: What are the recommended use cases?

The model is ideal for Chinese speech recognition tasks, particularly in scenarios requiring high accuracy such as meeting transcription, general speech-to-text conversion, and applications requiring robust performance in various acoustic conditions. It's especially effective for complex audio environments where traditional ASR systems might struggle.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.