Belle-whisper-large-v3-zh

Belle-whisper-large-v3-zh

BELLE-2

A fine-tuned Whisper large-v3 model optimized for Chinese ASR, achieving 24-65% improvement over baseline on major benchmarks like AISHELL and WenetSpeech.

PropertyValue
LicenseApache-2.0
AuthorBELLE-2
FrameworkPyTorch, Transformers
TaskAutomatic Speech Recognition

What is Belle-whisper-large-v3-zh?

Belle-whisper-large-v3-zh is an advanced Chinese speech recognition model that builds upon OpenAI's Whisper large-v3 architecture. Through comprehensive fine-tuning on major Chinese speech datasets, it achieves remarkable improvements in Chinese ASR performance, showing 24-65% better results compared to the base model across various benchmarks.

Implementation Details

The model underwent full fine-tuning using multiple high-quality Chinese speech datasets, including AISHELL-1, AISHELL-2, WenetSpeech, and HKUST. It operates at a 16KHz sample rate and leverages the Transformers library for easy deployment.

  • Significant performance improvements on Chinese ASR benchmarks
  • Specialized for complex acoustic environments
  • Easy integration through Hugging Face Transformers pipeline
  • Supports transcription tasks with Chinese language optimization

Core Capabilities

  • Achieves 2.781% CER on AISHELL-1 test set
  • Demonstrates exceptional performance in meeting scenarios (11.246% CER on WenetSpeech meeting)
  • Handles diverse acoustic environments effectively
  • Seamless integration with standard ASR pipelines

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its significantly improved performance on Chinese ASR tasks, particularly in challenging acoustic environments like meeting recordings. It offers a substantial improvement over the base whisper-large-v3 model, with error rate reductions of up to 65% in some scenarios.

Q: What are the recommended use cases?

The model is ideal for Chinese speech recognition tasks, particularly in scenarios requiring high accuracy such as meeting transcription, general speech-to-text conversion, and applications requiring robust performance in various acoustic conditions. It's especially effective for complex audio environments where traditional ASR systems might struggle.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026