snac_24khz

snac_24khz

hubertsiuzdak

SNAC is a neural audio codec that compresses 24kHz audio to 0.98 kbps using hierarchical tokens. Optimized for speech synthesis with 19.8M parameters.

PropertyValue
Model Size19.8M parameters
LicenseMIT
Bitrate0.98 kbps
Sample Rate24 kHz
ArchitectureMulti-Scale Neural Audio Codec

What is snac_24khz?

SNAC (Multi-Scale Neural Audio Codec) is an innovative audio compression model designed specifically for speech synthesis applications. It implements a hierarchical token-based approach to compress audio efficiently while maintaining high quality at remarkably low bitrates.

Implementation Details

The model employs a sophisticated architecture that utilizes 3 RVQ (Residual Vector Quantization) levels, operating at different temporal resolutions - 12, 23, and 47 Hz. This multi-scale approach allows for efficient compression while preserving audio quality.

  • Supports single-channel (mono) audio processing
  • Implements hierarchical token compression similar to SoundStream and EnCodec
  • Features unique coarse token sampling at reduced frequencies
  • Achieves compression to 0.98 kbps bitrate

Core Capabilities

  • High-quality speech audio compression
  • Efficient encoding and decoding of 24kHz audio
  • Variable temporal resolution processing
  • PyTorch-based implementation with CUDA support

Frequently Asked Questions

Q: What makes this model unique?

SNAC's distinctive feature is its multi-scale approach where coarse tokens are sampled less frequently, covering broader time spans. This innovative design allows for efficient compression while maintaining audio quality.

Q: What are the recommended use cases?

This model is primarily optimized for speech synthesis applications. While it can process other audio types, it performs best with speech data due to its specific training focus.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026