MicroDiT

MicroDiT

VSehwag24

MicroDiT is a cost-efficient text-to-image diffusion model trained on a micro-budget ($1,890), achieving competitive performance with 1.16B parameters and 12.7 FID score.

PropertyValue
Parameter Count1.16 billion
Training Cost$1,890
LicenseApache 2.0
PaperarXiv:2407.15811

What is MicroDiT?

MicroDiT is a groundbreaking text-to-image diffusion transformer model that challenges the notion that high-quality AI models require massive computational resources. Developed with a focus on cost efficiency, it achieves competitive performance while using only a fraction of the training budget compared to similar models.

Implementation Details

The model employs several innovative techniques to achieve its efficiency:

  • Random masking of up to 75% of image patches during training
  • Deferred masking strategy with patch-mixer preprocessing
  • Mixture-of-experts layers for improved performance
  • Training pipeline progressing from 256×256 to 512×512 resolution
  • Total training time of 2.6 days on 8×H100 GPUs

Core Capabilities

  • Zero-shot generation with 12.7 FID score on COCO dataset
  • Multiple style generations including Origami, Pixel art, Line art, Cyberpunk, etc.
  • Four pre-trained model variants with different training data configurations
  • Efficient image generation at 512×512 resolution

Frequently Asked Questions

Q: What makes this model unique?

MicroDiT achieves comparable performance to larger models while requiring 118x lower costs than Stable Diffusion models and 14x lower costs than current state-of-the-art approaches. This is achieved through innovative masking strategies and efficient architecture design.

Q: What are the recommended use cases?

The model is particularly well-suited for text-to-image generation tasks, especially when resources are limited. It can generate high-quality images in various styles and is effective for both real and synthetic image generation tasks.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026