TangoFlux

Maintained By
declare-lab

TangoFlux

PropertyValue
Authordeclare-lab
LicenseStability AI Community License (Non-commercial research only)
PaperarXiv:2412.21037

What is TangoFlux?

TangoFlux is a cutting-edge text-to-audio generation model that combines flow matching techniques with CLAP-ranked preference optimization. It's designed to generate high-quality audio at 44.1kHz sampling rate for durations up to 30 seconds, making it particularly suitable for creating realistic sound effects and ambient audio from textual descriptions.

Implementation Details

The model architecture is built around FluxTransformer blocks, which incorporate both Diffusion Transformer (DiT) and Multimodal Diffusion Transformer (MMDiT) components. It uses a three-stage training pipeline: pre-training, fine-tuning, and preference optimization. The model leverages a variational autoencoder (VAE) for audio latent representation and employs a rectified flow trajectory for generation.

  • Utilizes CRPO (CLAP-Ranked Preference Optimization) for model alignment
  • Implements 25-50 sampling steps for generation
  • Supports textual prompting with duration control
  • Operates at professional audio quality (44.1kHz)

Core Capabilities

  • High-fidelity audio generation from text descriptions
  • Extended audio duration support up to 30 seconds
  • Flexible sampling step configuration for quality/speed tradeoff
  • Optimized preference-based training for improved output quality

Frequently Asked Questions

Q: What makes this model unique?

TangoFlux's uniqueness lies in its combination of flow matching and CLAP-ranked preference optimization, allowing for both fast and high-quality audio generation. The model's ability to generate extended duration audio at professional quality sampling rates sets it apart from many existing solutions.

Q: What are the recommended use cases?

The model is ideal for research applications requiring text-to-audio generation, such as sound effect creation, ambient audio generation, and audio content production. However, it's important to note that the model is licensed for non-commercial research use only.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.