NVIDIA Streaming Citrinet 1024 (Ukrainian)

Property	Value
Parameter Count	141M
License	CC-BY-4.0
Architecture	Citrinet-CTC
Paper	Citrinet Paper
WER (Common Voice 10.0)	5.02%

What is stt_uk_citrinet_1024_gamma_0_25?

This is a non-autoregressive speech recognition model specifically designed for Ukrainian language processing. Built on NVIDIA's Citrinet architecture, it's been fine-tuned from a pre-trained Russian model using Cross-Language Transfer Learning approach. The model processes 16kHz mono-channel audio and outputs transcribed text in lowercase Ukrainian alphabet.

Implementation Details

The model leverages the Citrinet-1024 architecture, trained for 1000 epochs using the NeMo toolkit. It was trained on 69 hours of validated Mozilla Common Voice Corpus 10.0 dataset, excluding dev and test data. The model employs a SentencePiece Unigram tokenizer with a vocabulary size of 1024.

Non-autoregressive architecture optimized for streaming
CTC loss/decoding implementation
Supports production deployment through NVIDIA Riva
Compatible with PyTorch framework

Core Capabilities

Real-time speech transcription in Ukrainian
Handles conventional speech patterns with high accuracy
Supports streaming applications
Integration with NVIDIA Riva for production deployments

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient implementation of Cross-Language Transfer Learning, leveraging knowledge from a Russian pre-trained model to achieve high accuracy in Ukrainian speech recognition. Its streaming capabilities and integration with NVIDIA Riva make it suitable for production environments.

Q: What are the recommended use cases?

The model is ideal for Ukrainian speech transcription tasks, particularly in applications requiring real-time processing. It's best suited for clear speech in standard Ukrainian, though performance may vary with technical terms or heavy accents.

stt_uk_citrinet_1024_gamma_0_25