stable-diffusion-1.5

Jiali

Stable Diffusion v1.5 is a state-of-the-art text-to-image model trained on LAION-aesthetics, capable of generating photorealistic images from text prompts at 512x512 resolution.

Property	Value
Authors	Robin Rombach, Patrick Esser
License	CreativeML OpenRAIL M
Training Data	LAION-aesthetics v2 5+
Resolution	512x512
Paper	High-Resolution Image Synthesis With Latent Diffusion Models (CVPR 2022)

What is stable-diffusion-1.5?

Stable Diffusion v1.5 is a powerful latent text-to-image diffusion model that represents a significant advancement in AI-powered image generation. Built upon the foundation of v1.2, this model underwent extensive fine-tuning with 595k additional steps on the curated LAION-aesthetics v2 5+ dataset, incorporating innovative techniques like 10% text-conditioning dropout to enhance classifier-free guidance sampling.

Implementation Details

The model employs a sophisticated architecture combining an autoencoder with a diffusion model trained in latent space. It uses a CLIP ViT-L/14 text encoder and features a relative downsampling factor of 8, efficiently mapping images to latent representations.

Supports both inference (4.27GB ema-only) and fine-tuning (7.7GB full) versions
Trained on 32 x 8 A100 GPUs with AdamW optimizer
Batch size of 2048 with gradient accumulation
Constant learning rate of 0.0001 after 10,000 warmup steps

Core Capabilities

High-quality photorealistic image generation from text descriptions
Efficient processing at 512x512 resolution
Advanced composition handling through latent space operations
Support for creative and artistic applications
Research-focused features for model analysis and improvement

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its optimized balance between quality and efficiency, incorporating improved aesthetics through specialized dataset curation and innovative training techniques like text-conditioning dropout. It's particularly notable for its stability and consistent output quality.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, including safe deployment studies, bias investigation, artistic creation, educational tools, and generative model research. It specifically excludes creation of harmful content, misrepresentation, or commercial use without proper safety mechanisms.