MusicGen Large

Property	Value
Model Size	3.3B parameters
License	CC-BY-NC 4.0
Author	Meta AI (FAIR team)
Paper	Simple and Controllable Music Generation

What is musicgen-large?

MusicGen-Large is an advanced text-to-music generation model developed by Meta AI's FAIR team. It represents the largest variant (3.3B parameters) of the MusicGen family, capable of generating high-quality instrumental music from text descriptions at 32kHz sample rate. The model utilizes a single-stage auto-regressive Transformer architecture combined with an EnCodec tokenizer featuring 4 codebooks sampled at 50 Hz.

Implementation Details

The model architecture consists of two main components: an EnCodec model for audio tokenization and an auto-regressive language model based on the transformer architecture. It generates all 4 codebooks in parallel with a small delay between them, requiring only 50 auto-regressive steps per second of audio.

Sampling rate: 32kHz
Generation capability: Up to 8 seconds of music
Training data: 20K hours of licensed music
Input format: Text descriptions in English

Core Capabilities

High-quality instrumental music generation from text descriptions
Parallel generation of multiple audio codebooks
Support for various musical styles and genres
Integration with both Transformers and Audiocraft libraries

Frequently Asked Questions

Q: What makes this model unique?

Unlike other models such as MusicLM, MusicGen doesn't require a self-supervised semantic representation and can generate all codebooks in a single pass, making it more efficient and straightforward to use.

Q: What are the recommended use cases?

The model is primarily intended for research in AI-based music generation, including studying model limitations and capabilities. It should not be used for commercial applications without proper evaluation and risk mitigation.

musicgen-large