Icefall ASR TAL-CSASR Pruned Transducer Stateless5
Property | Value |
---|---|
Author | luomingshuang |
Framework | Icefall/K2 |
Dataset | TAL_CSASR |
Training Duration | 30 epochs |
What is icefall_asr_tal-csasr_pruned_transducer_stateless5?
This is a state-of-the-art automatic speech recognition (ASR) model built using the Icefall framework and trained on the TAL_CSASR dataset. The model implements a pruned transducer architecture with stateless decoding, specifically designed for handling both Chinese and English speech recognition tasks.
Implementation Details
The model is implemented using the K2 speech recognition toolkit and trained for 30 epochs on far-field audio data. It employs a pruned transducer architecture with stateless decoding, which helps achieve efficient inference while maintaining high accuracy.
- Trained using distributed training across 6 GPUs
- Supports multiple decoding methods including greedy search, modified beam search, and fast beam search
- Implements model averaging for improved performance
Core Capabilities
- Achieves 7.15% CER on dev set and 7.22% on test set using modified beam search with averaged model
- Handles both Chinese (CER) and English (WER) recognition tasks
- Chinese performance: 6.35% CER on dev set, 6.50% CER on test set
- English performance: 18.95% WER on dev set, 18.70% WER on test set
Frequently Asked Questions
Q: What makes this model unique?
The model uniquely combines pruned transducer architecture with stateless decoding, offering efficient inference for both Chinese and English speech recognition. It achieves competitive error rates while supporting multiple decoding strategies.
Q: What are the recommended use cases?
This model is particularly suited for far-field speech recognition applications requiring bilingual (Chinese-English) capabilities. It's ideal for scenarios where both accuracy and inference efficiency are important considerations.