MusicGen Stereo Small
Property | Value |
---|---|
Parameters | 300M |
Developer | Meta AI (FAIR team) |
Release Date | 2023 |
License | Code: MIT, Weights: CC-BY-NC 4.0 |
Paper | Simple and Controllable Music Generation |
What is musicgen-stereo-small?
MusicGen Stereo Small is a specialized text-to-music generation model that produces stereophonic audio output. It's a fine-tuned version of the original MusicGen small model, specifically adapted to create stereo music with enhanced spatial depth and directionality. The model operates at 32kHz with 4 codebooks and uses an innovative approach to generate stereo audio streams.
Implementation Details
The model is built on a single-stage auto-regressive Transformer architecture that works in conjunction with an EnCodec tokenizer. It processes two separate audio streams for stereo output, interleaving them using a delay pattern. The model generates all 4 codebooks in one pass, requiring only 50 auto-regressive steps per second of audio.
- 32kHz sampling rate with EnCodec tokenization
- 4 codebooks sampled at 50 Hz
- Parallel prediction capability through small delays between codebooks
- Trained on licensed music data from Meta Music Initiative, Shutterstock, and Pond5
Core Capabilities
- High-quality stereo music generation from text descriptions
- Support for various music styles and genres
- 50Hz token generation rate
- Integration with popular ML frameworks like 🤗 Transformers
- Efficient parallel processing design
Frequently Asked Questions
Q: What makes this model unique?
This model is unique in its ability to generate true stereophonic audio without requiring a self-supervised semantic representation, unlike competitors such as MusicLM. It's also one of the few models specifically designed for stereo music generation.
Q: What are the recommended use cases?
The model is primarily intended for research purposes in AI-based music generation, including understanding generative model limitations and exploring text-guided music creation. It's not recommended for commercial applications without proper risk evaluation and mitigation.