Stable Diffusion v1.4
Property | Value |
---|---|
Developer | CompVis (Robin Rombach, Patrick Esser) |
License | CreativeML OpenRAIL-M |
Training Data | LAION-aesthetics v2 5+ |
Research Paper | High-Resolution Image Synthesis With Latent Diffusion Models (CVPR 2022) |
What is stable-diffusion-v-1-4-original?
Stable Diffusion v1.4 is a sophisticated latent text-to-image diffusion model that represents a significant advancement in AI-powered image generation. Built upon the v1.2 checkpoint, this model underwent extensive fine-tuning with 225k additional steps at 512x512 resolution, incorporating improved classifier-free guidance sampling techniques.
Implementation Details
The model employs a latent diffusion architecture that combines an autoencoder with a diffusion model trained in latent space. It utilizes a CLIP ViT-L/14 text encoder for processing prompts and features a relative downsampling factor of 8, transforming images from HxWx3 to H/f x W/f x 4 dimensions.
- Training Infrastructure: 32 x 8 x A100 GPUs
- Batch Size: 2048
- Optimizer: AdamW with 0.0001 learning rate
- Training Data: LAION-aesthetics v2 5+ dataset
Core Capabilities
- High-quality image generation at 512x512 resolution
- Text-guided image synthesis
- Artistic and creative content generation
- Research and educational applications
- Design and artistic workflow integration
Frequently Asked Questions
Q: What makes this model unique?
This model's uniqueness lies in its optimized training process, incorporating 10% text-conditioning dropout and extensive fine-tuning on carefully curated datasets, resulting in superior image generation quality compared to previous versions.
Q: What are the recommended use cases?
The model is primarily intended for research purposes, including safe deployment studies, bias investigation, artistic creation, educational tools, and generative model research. It specifically excludes harmful content generation and misuse cases.