NVIDIA Streaming Citrinet 1024 (Ukrainian)
Property | Value |
---|---|
Parameter Count | 141M |
License | CC-BY-4.0 |
Architecture | Citrinet-CTC |
Paper | Citrinet Paper |
WER (Common Voice 10.0) | 5.02% |
What is stt_uk_citrinet_1024_gamma_0_25?
This is a non-autoregressive speech recognition model specifically designed for Ukrainian language processing. Built on NVIDIA's Citrinet architecture, it's been fine-tuned from a pre-trained Russian model using Cross-Language Transfer Learning approach. The model processes 16kHz mono-channel audio and outputs transcribed text in lowercase Ukrainian alphabet.
Implementation Details
The model leverages the Citrinet-1024 architecture, trained for 1000 epochs using the NeMo toolkit. It was trained on 69 hours of validated Mozilla Common Voice Corpus 10.0 dataset, excluding dev and test data. The model employs a SentencePiece Unigram tokenizer with a vocabulary size of 1024.
- Non-autoregressive architecture optimized for streaming
- CTC loss/decoding implementation
- Supports production deployment through NVIDIA Riva
- Compatible with PyTorch framework
Core Capabilities
- Real-time speech transcription in Ukrainian
- Handles conventional speech patterns with high accuracy
- Supports streaming applications
- Integration with NVIDIA Riva for production deployments
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient implementation of Cross-Language Transfer Learning, leveraging knowledge from a Russian pre-trained model to achieve high accuracy in Ukrainian speech recognition. Its streaming capabilities and integration with NVIDIA Riva make it suitable for production environments.
Q: What are the recommended use cases?
The model is ideal for Ukrainian speech transcription tasks, particularly in applications requiring real-time processing. It's best suited for clear speech in standard Ukrainian, though performance may vary with technical terms or heavy accents.