Icefall ASR TAL-CSASR Pruned Transducer Stateless5

Property	Value
Author	luomingshuang
Framework	Icefall/K2
Dataset	TAL_CSASR
Training Duration	30 epochs

What is icefall_asr_tal-csasr_pruned_transducer_stateless5?

This is a state-of-the-art automatic speech recognition (ASR) model built using the Icefall framework and trained on the TAL_CSASR dataset. The model implements a pruned transducer architecture with stateless decoding, specifically designed for handling both Chinese and English speech recognition tasks.

Implementation Details

The model is implemented using the K2 speech recognition toolkit and trained for 30 epochs on far-field audio data. It employs a pruned transducer architecture with stateless decoding, which helps achieve efficient inference while maintaining high accuracy.

Trained using distributed training across 6 GPUs
Supports multiple decoding methods including greedy search, modified beam search, and fast beam search
Implements model averaging for improved performance

Core Capabilities

Achieves 7.15% CER on dev set and 7.22% on test set using modified beam search with averaged model
Handles both Chinese (CER) and English (WER) recognition tasks
Chinese performance: 6.35% CER on dev set, 6.50% CER on test set
English performance: 18.95% WER on dev set, 18.70% WER on test set

Frequently Asked Questions

Q: What makes this model unique?

The model uniquely combines pruned transducer architecture with stateless decoding, offering efficient inference for both Chinese and English speech recognition. It achieves competitive error rates while supporting multiple decoding strategies.

Q: What are the recommended use cases?

This model is particularly suited for far-field speech recognition applications requiring bilingual (Chinese-English) capabilities. It's ideal for scenarios where both accuracy and inference efficiency are important considerations.