PixArt-LCM-XL-2-1024-MS

Maintained By
PixArt-alpha

PixArt-LCM-XL-2-1024-MS

PropertyValue
LicenseOpenRAIL++
Model TypeDiffusion-Transformer-based text-to-image
Primary PaperPixArt-α
Secondary PaperLCM

What is PixArt-LCM-XL-2-1024-MS?

PixArt-LCM-XL-2-1024-MS is a revolutionary text-to-image generation model that combines the powerful PixArt-α architecture with Latent Consistency Models (LCM) to achieve unprecedented generation speeds. This model can generate high-quality 1024px images in just 4 steps, significantly faster than traditional models like SDXL.

Implementation Details

The model utilizes pure transformer blocks for latent diffusion and incorporates T5 text encoders and a VAE latent feature encoder. It employs LCM's diffusion distillation method to predict PF-ODE's solution directly in latent space, enabling extremely fast inference.

  • Supports 1024px resolution image generation
  • Requires only 4 inference steps
  • Compatible with torch.compile for 20-30% speed improvement
  • Supports CPU offloading for limited VRAM scenarios

Core Capabilities

  • Ultra-fast generation: 0.51s on A100, 3.3s on T4
  • High-resolution output (1024px)
  • Efficient resource utilization
  • Artistic and creative image generation
  • Research-focused applications

Frequently Asked Questions

Q: What makes this model unique?

The combination of PixArt-α architecture with LCM enables unprecedented speed - generating high-quality 1024px images in just 4 steps, compared to SDXL's 25 steps, while maintaining quality.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, including artwork generation, educational tools, creative applications, and studying generative AI limitations and biases. It's not recommended for generating factual content or true representations of people/events.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.