snac_24khz

Maintained By
hubertsiuzdak

SNAC 24kHz Audio Codec

PropertyValue
Model Size19.8M parameters
LicenseMIT
Bitrate0.98 kbps
Sample Rate24 kHz
ArchitectureMulti-Scale Neural Audio Codec

What is snac_24khz?

SNAC (Multi-Scale Neural Audio Codec) is an innovative audio compression model designed specifically for speech synthesis applications. It implements a hierarchical token-based approach to compress audio efficiently while maintaining high quality at remarkably low bitrates.

Implementation Details

The model employs a sophisticated architecture that utilizes 3 RVQ (Residual Vector Quantization) levels, operating at different temporal resolutions - 12, 23, and 47 Hz. This multi-scale approach allows for efficient compression while preserving audio quality.

  • Supports single-channel (mono) audio processing
  • Implements hierarchical token compression similar to SoundStream and EnCodec
  • Features unique coarse token sampling at reduced frequencies
  • Achieves compression to 0.98 kbps bitrate

Core Capabilities

  • High-quality speech audio compression
  • Efficient encoding and decoding of 24kHz audio
  • Variable temporal resolution processing
  • PyTorch-based implementation with CUDA support

Frequently Asked Questions

Q: What makes this model unique?

SNAC's distinctive feature is its multi-scale approach where coarse tokens are sampled less frequently, covering broader time spans. This innovative design allows for efficient compression while maintaining audio quality.

Q: What are the recommended use cases?

This model is primarily optimized for speech synthesis applications. While it can process other audio types, it performs best with speech data due to its specific training focus.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.