MusicGen Large
Property | Value |
---|---|
Model Size | 3.3B parameters |
License | CC-BY-NC 4.0 |
Author | Meta AI (FAIR team) |
Paper | Simple and Controllable Music Generation |
What is musicgen-large?
MusicGen-Large is an advanced text-to-music generation model developed by Meta AI's FAIR team. It represents the largest variant (3.3B parameters) of the MusicGen family, capable of generating high-quality instrumental music from text descriptions at 32kHz sample rate. The model utilizes a single-stage auto-regressive Transformer architecture combined with an EnCodec tokenizer featuring 4 codebooks sampled at 50 Hz.
Implementation Details
The model architecture consists of two main components: an EnCodec model for audio tokenization and an auto-regressive language model based on the transformer architecture. It generates all 4 codebooks in parallel with a small delay between them, requiring only 50 auto-regressive steps per second of audio.
- Sampling rate: 32kHz
- Generation capability: Up to 8 seconds of music
- Training data: 20K hours of licensed music
- Input format: Text descriptions in English
Core Capabilities
- High-quality instrumental music generation from text descriptions
- Parallel generation of multiple audio codebooks
- Support for various musical styles and genres
- Integration with both Transformers and Audiocraft libraries
Frequently Asked Questions
Q: What makes this model unique?
Unlike other models such as MusicLM, MusicGen doesn't require a self-supervised semantic representation and can generate all codebooks in a single pass, making it more efficient and straightforward to use.
Q: What are the recommended use cases?
The model is primarily intended for research in AI-based music generation, including studying model limitations and capabilities. It should not be used for commercial applications without proper evaluation and risk mitigation.