TeleSpeech-ASR1.0
Property | Value |
---|---|
License | Apache-2.0 |
Base Model Size | 0.09B parameters |
Large Model Size | 0.3B parameters |
Training Data | 300K hours unlabeled + 30 labeled dialects |
What is TeleSpeech-ASR1.0?
TeleSpeech-ASR1.0 is a groundbreaking multi-dialect speech recognition model developed by Tele-AI. It represents a significant advancement in handling Chinese dialect recognition by overcoming the traditional limitation of single-dialect models. The model is pre-trained on 300,000 hours of unlabeled multi-dialect speech data and fine-tuned with 30 different labeled dialects.
Implementation Details
The model is released in three variants: two pre-trained models (base and large) and one fine-tuned model. The base model contains 0.09B parameters, while the large model scales up to 0.3B parameters. The fine-tuned version is specifically optimized for the KeSpeech dataset covering 8 major Chinese dialects.
- Pre-trained base model: 0.09B parameters for feature extraction
- Pre-trained large model: 0.3B parameters with enhanced capabilities
- Fine-tuned KeSpeech model: Optimized for practical dialect recognition
Core Capabilities
- Multi-dialect recognition spanning 30 Chinese dialects
- Support for major dialects including Cantonese, Shanghai, Sichuan, and Wenzhou
- Character Error Rate (CER) as low as 4.0% on Aishell-1
- Robust performance across various test sets including WenetSpeech and Babel
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to handle multiple dialects simultaneously with a single model architecture sets it apart from traditional single-dialect ASR systems. Its extensive pre-training on 300K hours of unlabeled data provides robust feature extraction capabilities.
Q: What are the recommended use cases?
The model is ideal for applications requiring multi-dialect Chinese speech recognition, particularly in scenarios involving regional dialect variations. It's suitable for both academic research and commercial applications, though commercial use requires specific licensing.