stable-diffusion-1.5

stable-diffusion-1.5

Jiali

Stable Diffusion v1.5 is a state-of-the-art text-to-image model trained on LAION-aesthetics, capable of generating photorealistic images from text prompts at 512x512 resolution.

PropertyValue
AuthorsRobin Rombach, Patrick Esser
LicenseCreativeML OpenRAIL M
Training DataLAION-aesthetics v2 5+
Resolution512x512
PaperHigh-Resolution Image Synthesis With Latent Diffusion Models (CVPR 2022)

What is stable-diffusion-1.5?

Stable Diffusion v1.5 is a powerful latent text-to-image diffusion model that represents a significant advancement in AI-powered image generation. Built upon the foundation of v1.2, this model underwent extensive fine-tuning with 595k additional steps on the curated LAION-aesthetics v2 5+ dataset, incorporating innovative techniques like 10% text-conditioning dropout to enhance classifier-free guidance sampling.

Implementation Details

The model employs a sophisticated architecture combining an autoencoder with a diffusion model trained in latent space. It uses a CLIP ViT-L/14 text encoder and features a relative downsampling factor of 8, efficiently mapping images to latent representations.

  • Supports both inference (4.27GB ema-only) and fine-tuning (7.7GB full) versions
  • Trained on 32 x 8 A100 GPUs with AdamW optimizer
  • Batch size of 2048 with gradient accumulation
  • Constant learning rate of 0.0001 after 10,000 warmup steps

Core Capabilities

  • High-quality photorealistic image generation from text descriptions
  • Efficient processing at 512x512 resolution
  • Advanced composition handling through latent space operations
  • Support for creative and artistic applications
  • Research-focused features for model analysis and improvement

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its optimized balance between quality and efficiency, incorporating improved aesthetics through specialized dataset curation and innovative training techniques like text-conditioning dropout. It's particularly notable for its stability and consistent output quality.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, including safe deployment studies, bias investigation, artistic creation, educational tools, and generative model research. It specifically excludes creation of harmful content, misrepresentation, or commercial use without proper safety mechanisms.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026