encodec_24khz

Maintained By
facebook

EnCodec 24kHz Neural Audio Codec

PropertyValue
Parameters23.3M
Model TypeAudio Codec
AuthorMeta AI
PaperHigh Fidelity Neural Audio Compression
Tensor TypeF32

What is encodec_24khz?

EnCodec 24kHz is a state-of-the-art neural audio codec developed by Meta AI that provides real-time audio compression and decompression. It features a streaming encoder-decoder architecture with quantized latent space, trained end-to-end for optimal performance. The model leverages a unique multiscale spectrogram adversary to reduce artifacts and enhance audio quality.

Implementation Details

The model employs a sophisticated architecture trained on multiple datasets including DNS Challenge 4, Common Voice, AudioSet, FSD50K, and the Jamendo dataset. It was trained for 300 epochs using 8 A100 GPUs, with Adam optimizer and a batch size of 64 examples.

  • Supports both streamable and non-streamable configurations
  • Operates at various bandwidths (1.5, 3, 6, and 12 kbps)
  • Includes weight normalization for convolution layers
  • Features a novel loss balancer mechanism for training stability

Core Capabilities

  • Real-time audio compression and decompression
  • High-fidelity audio reproduction
  • Bandwidth reduction of 25-40% with language model integration
  • Support for both speech and music processing
  • Multiple sampling rate compatibility

Frequently Asked Questions

Q: What makes this model unique?

EnCodec stands out for its real-time performance and superior audio quality, consistently outperforming baselines like Lyra-v2 and Opus. It achieves better performance at 3 kbps compared to Opus at 12 kbps, making it highly efficient.

Q: What are the recommended use cases?

The model is ideal for real-time audio compression applications, speech generation, music streaming, and text-to-speech tasks. It can be used directly or fine-tuned for specific audio processing needs in larger pipelines.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.