stt_zh_conformer_transducer_large

Maintained By
nvidia

NVIDIA Conformer-Transducer Large (Mandarin)

PropertyValue
Model Size120M parameters
Input Format16kHz Mono Audio (WAV)
Vocabulary Size5026 characters
LicenseCC-BY-4.0
Training DatasetAISHELL-2
Best WER5.3% (Test IOS)

What is stt_zh_conformer_transducer_large?

This is NVIDIA's large-scale speech recognition model specifically designed for Mandarin Chinese transcription. Built on the Conformer-Transducer architecture, it combines convolution-augmented transformer technology with transducer-based decoding to achieve state-of-the-art performance in Mandarin speech recognition.

Implementation Details

The model utilizes the NeMo toolkit for both training and inference, featuring a character-based tokenization system with a vocabulary of 5026 characters. It processes 16kHz mono audio input and outputs text transcriptions directly in Mandarin characters.

  • Trained on the comprehensive AISHELL-2 dataset
  • Achieves 5.3-5.7% Word Error Rate across different test conditions
  • Implements autoregressive decoding with transducer loss
  • Supports easy integration through NeMo toolkit

Core Capabilities

  • High-accuracy Mandarin speech transcription
  • Batch processing of multiple audio files
  • Simple Python API integration
  • Support for different audio input environments (iOS, Android, Mic)

Frequently Asked Questions

Q: What makes this model unique?

This model combines the powerful Conformer architecture with transducer-based decoding, specifically optimized for Mandarin Chinese. Its large parameter count (120M) and extensive training on AISHELL-2 enable superior performance across different recording conditions.

Q: What are the recommended use cases?

The model is ideal for Mandarin speech transcription in applications requiring high accuracy, such as automated transcription services, voice assistants, and speech analytics platforms. However, it may have limitations with technical terms or heavily accented speech not present in the training data.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.