musicgen-stereo-large

Maintained By
facebook

MusicGen Stereo Large

PropertyValue
Parameter Count3.46B
Model TypeText-to-Audio Generation
ArchitectureTransformer-based
LicenseCC-BY-NC 4.0
PaperSimple and Controllable Music Generation

What is musicgen-stereo-large?

MusicGen Stereo Large is an advanced text-to-music generation model developed by Facebook that specializes in creating high-quality stereophonic music. It's a fine-tuned version of the original MusicGen model, specifically adapted to produce stereo audio output, creating a more immersive listening experience with depth and directional sound.

Implementation Details

The model utilizes a sophisticated architecture combining an EnCodec tokenizer operating at 32kHz with a 4-codebook system sampled at 50 Hz. It generates stereo audio by processing two separate audio streams and interleaving them using a delay pattern. The model was fine-tuned for 200,000 updates from the original mono version.

  • Single-stage autoregressive Transformer architecture
  • Generates all 4 codebooks in one pass
  • 50 autoregressive steps per second of audio
  • 32kHz sampling rate capability

Core Capabilities

  • High-quality stereo music generation from text descriptions
  • Advanced stereophonic sound production
  • Support for various music styles and genres
  • Efficient parallel prediction of codebooks
  • Integration with popular ML frameworks like PyTorch

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to generate true stereophonic sound, creating a more immersive listening experience than mono models. It's also one of the largest music generation models available at 3.46B parameters, offering high-quality output without requiring self-supervised semantic representations.

Q: What are the recommended use cases?

The model is primarily intended for research in AI-based music generation, including studying generative models' capabilities and limitations. It's particularly useful for researchers and ML enthusiasts exploring text-guided music generation, though it should not be used for commercial applications due to its license restrictions.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.