Parakeet RNNT 1.1B

Property	Value
Parameter Count	1.1 Billion
Model Type	Automatic Speech Recognition (ASR)
Architecture	FastConformer Transducer
License	CC-BY-4.0
Training Data	64K hours of English speech

What is parakeet-rnnt-1.1b?

Parakeet RNNT 1.1B is a state-of-the-art automatic speech recognition model jointly developed by NVIDIA NeMo and Suno.ai teams. It represents an XXL version of the FastConformer Transducer architecture, designed to transcribe speech into lowercase English text with exceptional accuracy. The model has been trained on an extensive dataset of 64,000 hours of English speech, including both private and public datasets.

Implementation Details

The model is built on the FastConformer architecture, which features 8x depthwise-separable convolutional downsampling and is optimized for efficient speech recognition. It processes 16kHz mono-channel audio and outputs text transcriptions. The model utilizes a SentencePiece Unigram tokenizer with a vocabulary size of 1024.

Trained using NVIDIA NeMo toolkit
Implements multitask setup with Transducer decoder (RNNT) loss
Achieves impressive WER scores across various datasets (e.g., 1.46% on LibriSpeech test-clean)
Compatible with both Python API and command-line interface

Core Capabilities

High-accuracy speech transcription to lowercase English text
Processes 16kHz mono-channel audio files
Efficient processing with optimized architecture
Supports batch processing of multiple audio files
Easy integration through NeMo toolkit

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its large-scale architecture (1.1B parameters) and extensive training data (64K hours), combining both private and public datasets. Its implementation of the FastConformer architecture with 8x downsampling makes it both accurate and efficient.

Q: What are the recommended use cases?

The model is ideal for large-scale speech recognition tasks requiring high accuracy, particularly in English language applications. It's suitable for both research and production environments, especially when deployed through the NVIDIA NeMo toolkit or NVIDIA Riva framework.