PixArt-LCM-XL-2-1024-MS

Property	Value
License	OpenRAIL++
Model Type	Diffusion-Transformer-based text-to-image
Primary Paper	PixArt-α
Secondary Paper	LCM

What is PixArt-LCM-XL-2-1024-MS?

PixArt-LCM-XL-2-1024-MS is a revolutionary text-to-image generation model that combines the powerful PixArt-α architecture with Latent Consistency Models (LCM) to achieve unprecedented generation speeds. This model can generate high-quality 1024px images in just 4 steps, significantly faster than traditional models like SDXL.

Implementation Details

The model utilizes pure transformer blocks for latent diffusion and incorporates T5 text encoders and a VAE latent feature encoder. It employs LCM's diffusion distillation method to predict PF-ODE's solution directly in latent space, enabling extremely fast inference.

Supports 1024px resolution image generation
Requires only 4 inference steps
Compatible with torch.compile for 20-30% speed improvement
Supports CPU offloading for limited VRAM scenarios

Core Capabilities

Ultra-fast generation: 0.51s on A100, 3.3s on T4
High-resolution output (1024px)
Efficient resource utilization
Artistic and creative image generation
Research-focused applications

Frequently Asked Questions

Q: What makes this model unique?

The combination of PixArt-α architecture with LCM enables unprecedented speed - generating high-quality 1024px images in just 4 steps, compared to SDXL's 25 steps, while maintaining quality.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, including artwork generation, educational tools, creative applications, and studying generative AI limitations and biases. It's not recommended for generating factual content or true representations of people/events.