PixArt-XL-2-1024-MS

Maintained By
PixArt-alpha

PixArt-XL-2-1024-MS

PropertyValue
LicenseOpenRAIL++
Parameters0.6B
Training Data25M images
Training Cost675 A100 GPU days
PaperarXiv:2310.00426

What is PixArt-XL-2-1024-MS?

PixArt-XL-2-1024-MS is a revolutionary text-to-image diffusion transformer model that combines exceptional efficiency with high-quality image generation capabilities. It generates 1024px images directly from text prompts in a single sampling process, using pure transformer blocks for latent diffusion.

Implementation Details

The model utilizes T5 for text encoding and a specialized VAE for latent feature encoding. It's implemented using the diffusers library and can be accelerated using torch.compile for 20-30% faster inference on torch >= 2.0. The model achieves its results with significantly less computational resources than competitors, requiring only 675 A100 GPU days compared to SD 1.5's 6,250.

  • Efficient architecture requiring only 0.6B parameters
  • Supports high-resolution 1024px image generation
  • Compatible with various sampling methods including SA-Solver
  • Includes CPU offloading capabilities for limited VRAM scenarios

Core Capabilities

  • Direct generation of 1024px images from text
  • Comparable or better quality than SDXL 0.9 and DALLE-2 in user studies
  • 90% reduction in training costs and CO2 emissions compared to SD 1.5
  • Efficient memory usage with various optimization options

Frequently Asked Questions

Q: What makes this model unique?

The model's primary distinction is its exceptional efficiency, achieving state-of-the-art results with only 10.8% of Stable Diffusion v1.5's training time and resources while maintaining comparable or better output quality.

Q: What are the recommended use cases?

The model is intended for research purposes, particularly in areas such as artwork generation, educational tools, generative model research, and studying AI safety. It's not intended for generating factual content or true representations of people or events.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.