Parakeet-TDT-1.1B
Property | Value |
---|---|
Parameter Count | 1.1 Billion |
Model Type | Automatic Speech Recognition (ASR) |
Architecture | FastConformer-TDT |
License | CC-BY-4.0 |
Input Format | 16000 Hz mono-channel audio (WAV) |
What is parakeet-tdt-1.1b?
Parakeet-TDT-1.1B is a state-of-the-art ASR model jointly developed by NVIDIA NeMo and Suno.ai teams. This XXL version of FastConformer TDT represents a significant advancement in speech recognition technology, trained on an extensive dataset of 64,000 hours of English speech, including both private and public datasets.
Implementation Details
The model utilizes a FastConformer-TDT architecture, which combines an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. The Token-and-Duration Transducer (TDT) component decouples token and duration predictions, enabling significant inference speed improvements by skipping unnecessary blank predictions.
- Trained using NVIDIA NeMo toolkit over hundreds of epochs
- Implements SentencePiece Unigram tokenizer with 1024 vocabulary size
- Achieves impressive WER scores across various datasets (e.g., 1.39% on LibriSpeech test-clean)
Core Capabilities
- Transcribes English speech to lowercase text
- Processes 16kHz mono-channel audio
- Demonstrates robust performance across multiple domains
- Shows fair performance across different gender and age groups
- Enables easy integration through NeMo toolkit
Frequently Asked Questions
Q: What makes this model unique?
The model's unique FastConformer-TDT architecture and extensive training dataset (64K hours) make it particularly efficient and accurate. Its ability to skip blank predictions through duration prediction sets it apart from conventional transducers.
Q: What are the recommended use cases?
The model is ideal for general-purpose English speech transcription tasks, particularly in scenarios requiring high accuracy across different domains. It's especially suitable for applications in academic research, content transcription, and enterprise solutions that can be implemented through the NVIDIA NeMo toolkit.