parakeet-tdt-1.1b

Maintained By
nvidia

Parakeet-TDT-1.1B

PropertyValue
Parameter Count1.1 Billion
Model TypeAutomatic Speech Recognition (ASR)
ArchitectureFastConformer-TDT
LicenseCC-BY-4.0
Input Format16000 Hz mono-channel audio (WAV)

What is parakeet-tdt-1.1b?

Parakeet-TDT-1.1B is a state-of-the-art ASR model jointly developed by NVIDIA NeMo and Suno.ai teams. This XXL version of FastConformer TDT represents a significant advancement in speech recognition technology, trained on an extensive dataset of 64,000 hours of English speech, including both private and public datasets.

Implementation Details

The model utilizes a FastConformer-TDT architecture, which combines an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. The Token-and-Duration Transducer (TDT) component decouples token and duration predictions, enabling significant inference speed improvements by skipping unnecessary blank predictions.

  • Trained using NVIDIA NeMo toolkit over hundreds of epochs
  • Implements SentencePiece Unigram tokenizer with 1024 vocabulary size
  • Achieves impressive WER scores across various datasets (e.g., 1.39% on LibriSpeech test-clean)

Core Capabilities

  • Transcribes English speech to lowercase text
  • Processes 16kHz mono-channel audio
  • Demonstrates robust performance across multiple domains
  • Shows fair performance across different gender and age groups
  • Enables easy integration through NeMo toolkit

Frequently Asked Questions

Q: What makes this model unique?

The model's unique FastConformer-TDT architecture and extensive training dataset (64K hours) make it particularly efficient and accurate. Its ability to skip blank predictions through duration prediction sets it apart from conventional transducers.

Q: What are the recommended use cases?

The model is ideal for general-purpose English speech transcription tasks, particularly in scenarios requiring high accuracy across different domains. It's especially suitable for applications in academic research, content transcription, and enterprise solutions that can be implemented through the NVIDIA NeMo toolkit.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.