musicgen-stereo-small

musicgen-stereo-small

facebook

A 300M parameter stereo music generation model capable of creating high-quality stereo audio from text descriptions, part of Facebook's MusicGen family.

PropertyValue
Parameters300M
DeveloperMeta AI (FAIR team)
Release Date2023
LicenseCode: MIT, Weights: CC-BY-NC 4.0
PaperSimple and Controllable Music Generation

What is musicgen-stereo-small?

MusicGen Stereo Small is a specialized text-to-music generation model that produces stereophonic audio output. It's a fine-tuned version of the original MusicGen small model, specifically adapted to create stereo music with enhanced spatial depth and directionality. The model operates at 32kHz with 4 codebooks and uses an innovative approach to generate stereo audio streams.

Implementation Details

The model is built on a single-stage auto-regressive Transformer architecture that works in conjunction with an EnCodec tokenizer. It processes two separate audio streams for stereo output, interleaving them using a delay pattern. The model generates all 4 codebooks in one pass, requiring only 50 auto-regressive steps per second of audio.

  • 32kHz sampling rate with EnCodec tokenization
  • 4 codebooks sampled at 50 Hz
  • Parallel prediction capability through small delays between codebooks
  • Trained on licensed music data from Meta Music Initiative, Shutterstock, and Pond5

Core Capabilities

  • High-quality stereo music generation from text descriptions
  • Support for various music styles and genres
  • 50Hz token generation rate
  • Integration with popular ML frameworks like 🤗 Transformers
  • Efficient parallel processing design

Frequently Asked Questions

Q: What makes this model unique?

This model is unique in its ability to generate true stereophonic audio without requiring a self-supervised semantic representation, unlike competitors such as MusicLM. It's also one of the few models specifically designed for stereo music generation.

Q: What are the recommended use cases?

The model is primarily intended for research purposes in AI-based music generation, including understanding generative model limitations and exploring text-guided music creation. It's not recommended for commercial applications without proper risk evaluation and mitigation.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026