NVIDIA Conformer-CTC Large (Esperanto)
Property | Value |
---|---|
Parameter Count | 120M |
Architecture | Conformer-CTC |
License | CC-BY-4.0 |
Paper | Conformer: Convolution-augmented Transformer for Speech Recognition |
Test WER | 4.8% |
What is stt_eo_conformer_ctc_large?
This is a state-of-the-art speech recognition model specifically designed for the Esperanto language. It's based on the Conformer architecture, combining convolution and transformer components for optimal speech processing. The model was developed by NVIDIA and fine-tuned from an English SSL-pretrained model on the Mozilla Common Voice Esperanto 11.0 dataset.
Implementation Details
The model utilizes a non-autoregressive Conformer architecture with CTC loss/decoding, containing approximately 120 million parameters. It processes 16 kHz mono-channel audio and outputs lowercase Esperanto text. The model was trained using NVIDIA's NeMo toolkit and is compatible with NVIDIA Riva for production deployments.
- Built using SentencePiece tokenizer with 128 vocabulary size
- Trained on ~250 hours of Esperanto speech data
- Achieves 2.9% WER on dev set and 4.8% WER on test set
- Integrates with NeMo toolkit for easy inference and fine-tuning
Core Capabilities
- Real-time speech transcription for Esperanto
- Support for both single file and batch processing
- Easy integration with NVIDIA Riva for production deployment
- Compatible with standard audio formats (WAV)
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for Esperanto speech recognition, using transfer learning from English SSL-pretrained models. Its architecture combines the benefits of both convolutional and transformer approaches, making it particularly effective for speech processing tasks.
Q: What are the recommended use cases?
The model is ideal for Esperanto speech transcription tasks, particularly in applications requiring high accuracy and production-grade performance. It's especially suitable for deployment in NVIDIA Riva environments for enterprise-scale solutions.