mandarin-wav2vec2-aishell1
Property | Value |
---|---|
Author | kehanlu |
Research Paper | Context-aware knowledge transferring strategy for CTC-based ASR |
License | AISHELL-2 License (Free for academic use) |
Performance | 5.13% CER on test set |
What is mandarin-wav2vec2-aishell1?
This is a state-of-the-art Mandarin speech recognition model that leverages the wav2vec2.0 architecture. The model is pre-trained on 1000 hours of AISHELL-2 dataset and fine-tuned on 178 hours of AISHELL-1 dataset, making it particularly robust for Mandarin speech recognition tasks.
Implementation Details
The model implements a modified wav2vec2 architecture with an additional LayerNorm layer between the encoder output and CTC classification head. It uses the ESPNET toolkit for fine-tuning and has been converted to a Hugging Face format for easier deployment.
- Pre-trained on AISHELL-2 (1000 hours)
- Fine-tuned on AISHELL-1 (178 hours)
- Uses CTC-based ASR approach
- Implements custom ExtendedWav2Vec2ForCTC architecture
Core Capabilities
- Achieves 4.85% CER on dev set and 5.13% CER on test set
- Supports real-time Mandarin speech recognition
- Integrates seamlessly with Hugging Face's transformers library
- Optimized for academic and research applications
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness lies in its context-aware knowledge transferring strategy and the additional LayerNorm layer in its architecture, which helps improve recognition accuracy for Mandarin speech.
Q: What are the recommended use cases?
This model is ideal for academic research in Mandarin speech recognition, particularly for applications requiring high accuracy in transcription tasks. It's specifically designed for academic use and requires permission for commercial applications.