stt_uz_fastconformer_hybrid_large_pc

Maintained By
nvidia

NVIDIA FastConformer-Hybrid Large (Uzbek)

PropertyValue
Parameter Count115M
LicenseCC-BY-4.0
ArchitectureFastConformer-Transducer CTC
PaperFast Conformer Paper
WER (Common Voice)16.46%

What is stt_uz_fastconformer_hybrid_large_pc?

This is a state-of-the-art speech recognition model specifically designed for the Uzbek language. It's a hybrid model that combines Transducer and CTC losses, built on the FastConformer architecture with 115M parameters. The model processes 16kHz mono-channel audio and outputs transcribed text in both upper and lower case Uzbek alphabet.

Implementation Details

The model is implemented using NVIDIA's NeMo toolkit and leverages an optimized version of the Conformer architecture with 8x depthwise-separable convolutional downsampling. It was trained on approximately 1000 hours of Uzbek speech data from multiple sources including Mozilla Common Voice, UzbekVoice, and Google FLEURS.

  • Hybrid architecture combining Transducer and CTC losses
  • Trained on 1000 hours of diverse Uzbek speech data
  • Supports 16kHz mono-channel audio input
  • Achieves 16.46% WER on Common Voice test set

Core Capabilities

  • Transcribes Uzbek speech to text with high accuracy
  • Handles various speech patterns and accents
  • Supports both streaming and batch processing
  • Easy integration with NeMo toolkit for inference or fine-tuning

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its hybrid architecture combining Transducer and CTC losses, specifically optimized for Uzbek language processing. The large-scale training data and advanced FastConformer architecture enable superior performance in real-world applications.

Q: What are the recommended use cases?

The model is ideal for Uzbek speech transcription tasks, including automated transcription services, voice assistants, and speech analytics applications. It's particularly suitable for scenarios requiring high accuracy in Uzbek language processing.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.