stt_uz_fastconformer_hybrid_large_pc

nvidia

NVIDIA FastConformer-Hybrid Large model for Uzbek speech recognition, featuring 115M parameters and achieving 16.46% WER on Common Voice test set.

Property	Value
Parameter Count	115M
License	CC-BY-4.0
Architecture	FastConformer-Transducer CTC
Paper	Fast Conformer Paper
WER (Common Voice)	16.46%

What is stt_uz_fastconformer_hybrid_large_pc?

This is a state-of-the-art speech recognition model specifically designed for the Uzbek language. It's a hybrid model that combines Transducer and CTC losses, built on the FastConformer architecture with 115M parameters. The model processes 16kHz mono-channel audio and outputs transcribed text in both upper and lower case Uzbek alphabet.

Implementation Details

The model is implemented using NVIDIA's NeMo toolkit and leverages an optimized version of the Conformer architecture with 8x depthwise-separable convolutional downsampling. It was trained on approximately 1000 hours of Uzbek speech data from multiple sources including Mozilla Common Voice, UzbekVoice, and Google FLEURS.

Hybrid architecture combining Transducer and CTC losses
Trained on 1000 hours of diverse Uzbek speech data
Supports 16kHz mono-channel audio input
Achieves 16.46% WER on Common Voice test set

Core Capabilities

Transcribes Uzbek speech to text with high accuracy
Handles various speech patterns and accents
Supports both streaming and batch processing
Easy integration with NeMo toolkit for inference or fine-tuning

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its hybrid architecture combining Transducer and CTC losses, specifically optimized for Uzbek language processing. The large-scale training data and advanced FastConformer architecture enable superior performance in real-world applications.

Q: What are the recommended use cases?

The model is ideal for Uzbek speech transcription tasks, including automated transcription services, voice assistants, and speech analytics applications. It's particularly suitable for scenarios requiring high accuracy in Uzbek language processing.