musicgen-stereo-large

musicgen-stereo-large

facebook

A powerful 3.46B parameter stereo music generation model capable of creating high-quality music from text descriptions with advanced stereophonic capabilities.

PropertyValue
Parameter Count3.46B
Model TypeText-to-Audio Generation
ArchitectureTransformer-based
LicenseCC-BY-NC 4.0
PaperSimple and Controllable Music Generation

What is musicgen-stereo-large?

MusicGen Stereo Large is an advanced text-to-music generation model developed by Facebook that specializes in creating high-quality stereophonic music. It's a fine-tuned version of the original MusicGen model, specifically adapted to produce stereo audio output, creating a more immersive listening experience with depth and directional sound.

Implementation Details

The model utilizes a sophisticated architecture combining an EnCodec tokenizer operating at 32kHz with a 4-codebook system sampled at 50 Hz. It generates stereo audio by processing two separate audio streams and interleaving them using a delay pattern. The model was fine-tuned for 200,000 updates from the original mono version.

  • Single-stage autoregressive Transformer architecture
  • Generates all 4 codebooks in one pass
  • 50 autoregressive steps per second of audio
  • 32kHz sampling rate capability

Core Capabilities

  • High-quality stereo music generation from text descriptions
  • Advanced stereophonic sound production
  • Support for various music styles and genres
  • Efficient parallel prediction of codebooks
  • Integration with popular ML frameworks like PyTorch

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to generate true stereophonic sound, creating a more immersive listening experience than mono models. It's also one of the largest music generation models available at 3.46B parameters, offering high-quality output without requiring self-supervised semantic representations.

Q: What are the recommended use cases?

The model is primarily intended for research in AI-based music generation, including studying generative models' capabilities and limitations. It's particularly useful for researchers and ML enthusiasts exploring text-guided music generation, though it should not be used for commercial applications due to its license restrictions.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026