stable-diffusion-v1-4

stable-diffusion-v1-4

Narsil

Stable Diffusion v1.4 is a state-of-the-art text-to-image latent diffusion model, building upon v1.3 with enhanced aesthetics and classifier-free guidance capabilities.

PropertyValue
LicenseCreativeML OpenRAIL-M
AuthorsRobin Rombach, Patrick Esser
Training Infrastructure32 x 8 x A100 GPUs
Base ModelStable Diffusion v1.3

What is stable-diffusion-v1-4?

Stable Diffusion v1.4 is an advanced latent text-to-image diffusion model that represents a significant evolution in the field of AI-powered image generation. Built upon its predecessor v1.3, this model leverages a sophisticated latent diffusion architecture combined with a CLIP ViT-L/14 text encoder to generate high-quality images from textual descriptions.

Implementation Details

The model employs a complex architecture that combines an autoencoder with a diffusion model trained in latent space. It processes images through an encoder that transforms them into latent representations, using a downsampling factor of 8. The training procedure utilized AdamW optimizer with a learning rate of 0.0001 and a batch size of 2048, implemented across 32 A100 GPUs.

  • Utilizes CLIP ViT-L/14 text encoder for processing prompts
  • Implements cross-attention in the UNet backbone
  • Supports multiple scheduling algorithms including PLMS and K-LMS
  • Operates at 512x512 resolution for optimal results

Core Capabilities

  • High-quality text-to-image generation
  • Support for artistic and creative applications
  • Advanced compositional understanding
  • Classifier-free guidance sampling
  • Research and educational applications

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its improved aesthetic capabilities and refined classifier-free guidance sampling, building upon the successful architecture of v1.3. It's particularly notable for its balance between image quality and generation speed.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, including safe deployment studies, artistic applications, educational tools, and generative model research. It explicitly excludes the generation of harmful, offensive, or misleading content.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026