stt_uz_fastconformer_hybrid_large_pc

Maintained By
nvidia

NVIDIA FastConformer-Hybrid Large (Uzbek)

PropertyValue
Parameter Count115M
LicenseCC-BY-4.0
ArchitectureFastConformer-Transducer CTC
PaperFast Conformer Paper
WER (Common Voice)16.46%

What is stt_uz_fastconformer_hybrid_large_pc?

This is a state-of-the-art automatic speech recognition (ASR) model specifically designed for the Uzbek language. It's built on NVIDIA's FastConformer architecture, combining both Transducer and CTC approaches for robust speech recognition. The model has been trained on a diverse dataset of 1000 hours of Uzbek speech, including Common Voice, UzbekVoice, and Fleurs datasets.

Implementation Details

The model utilizes an optimized FastConformer architecture with 8x depthwise-separable convolutional downsampling. It's implemented using the NVIDIA NeMo toolkit and accepts 16kHz mono-channel audio as input. The hybrid approach combines Transducer (primary) and CTC losses during training for improved performance.

  • 115M trainable parameters
  • Supports both Transducer and CTC inference modes
  • Processes 16kHz mono-channel WAV files
  • Trained on 1000 hours of diverse Uzbek speech data

Core Capabilities

  • Transcribes Uzbek speech with high accuracy (16.46% WER on Common Voice)
  • Handles both upper and lower case Uzbek alphabet
  • Supports punctuation including spaces, commas, question marks, and dashes
  • Easy integration with NeMo toolkit for inference or fine-tuning

Frequently Asked Questions

Q: What makes this model unique?

This model combines FastConformer architecture with a hybrid Transducer-CTC approach, specifically optimized for Uzbek language. The large-scale training on 1000 hours of diverse Uzbek speech data makes it particularly robust for real-world applications.

Q: What are the recommended use cases?

The model is ideal for Uzbek speech transcription tasks in various domains. It's particularly suitable for applications requiring high accuracy in general speech recognition, though performance might vary with technical terms or specialized vocabulary not present in the training data.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.