mandarin-wav2vec2-aishell1

Maintained By
kehanlu

mandarin-wav2vec2-aishell1

PropertyValue
Authorkehanlu
Research PaperContext-aware knowledge transferring strategy for CTC-based ASR
LicenseAISHELL-2 License (Free for academic use)
Performance5.13% CER on test set

What is mandarin-wav2vec2-aishell1?

This is a state-of-the-art Mandarin speech recognition model that leverages the wav2vec2.0 architecture. The model is pre-trained on 1000 hours of AISHELL-2 dataset and fine-tuned on 178 hours of AISHELL-1 dataset, making it particularly robust for Mandarin speech recognition tasks.

Implementation Details

The model implements a modified wav2vec2 architecture with an additional LayerNorm layer between the encoder output and CTC classification head. It uses the ESPNET toolkit for fine-tuning and has been converted to a Hugging Face format for easier deployment.

  • Pre-trained on AISHELL-2 (1000 hours)
  • Fine-tuned on AISHELL-1 (178 hours)
  • Uses CTC-based ASR approach
  • Implements custom ExtendedWav2Vec2ForCTC architecture

Core Capabilities

  • Achieves 4.85% CER on dev set and 5.13% CER on test set
  • Supports real-time Mandarin speech recognition
  • Integrates seamlessly with Hugging Face's transformers library
  • Optimized for academic and research applications

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its context-aware knowledge transferring strategy and the additional LayerNorm layer in its architecture, which helps improve recognition accuracy for Mandarin speech.

Q: What are the recommended use cases?

This model is ideal for academic research in Mandarin speech recognition, particularly for applications requiring high accuracy in transcription tasks. It's specifically designed for academic use and requires permission for commercial applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.