TeleSpeech-ASR1.0

Tele-AI

A sophisticated multi-dialect speech recognition model trained on 300K hours of unlabeled audio data, supporting 30 Chinese dialects including Cantonese, Shanghai, and Sichuan dialects.

Property	Value
License	Apache-2.0
Base Model Size	0.09B parameters
Large Model Size	0.3B parameters
Training Data	300K hours unlabeled + 30 labeled dialects

What is TeleSpeech-ASR1.0?

TeleSpeech-ASR1.0 is a groundbreaking multi-dialect speech recognition model developed by Tele-AI. It represents a significant advancement in handling Chinese dialect recognition by overcoming the traditional limitation of single-dialect models. The model is pre-trained on 300,000 hours of unlabeled multi-dialect speech data and fine-tuned with 30 different labeled dialects.

Implementation Details

The model is released in three variants: two pre-trained models (base and large) and one fine-tuned model. The base model contains 0.09B parameters, while the large model scales up to 0.3B parameters. The fine-tuned version is specifically optimized for the KeSpeech dataset covering 8 major Chinese dialects.

Pre-trained base model: 0.09B parameters for feature extraction
Pre-trained large model: 0.3B parameters with enhanced capabilities
Fine-tuned KeSpeech model: Optimized for practical dialect recognition

Core Capabilities

Multi-dialect recognition spanning 30 Chinese dialects
Support for major dialects including Cantonese, Shanghai, Sichuan, and Wenzhou
Character Error Rate (CER) as low as 4.0% on Aishell-1
Robust performance across various test sets including WenetSpeech and Babel

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to handle multiple dialects simultaneously with a single model architecture sets it apart from traditional single-dialect ASR systems. Its extensive pre-training on 300K hours of unlabeled data provides robust feature extraction capabilities.

Q: What are the recommended use cases?

The model is ideal for applications requiring multi-dialect Chinese speech recognition, particularly in scenarios involving regional dialect variations. It's suitable for both academic research and commercial applications, though commercial use requires specific licensing.