sd-vae-ft-ema-original

sd-vae-ft-ema-original

stabilityai

Improved autoencoder model fine-tuned on LAION-Aesthetics, optimized for Stable Diffusion with better face reconstruction and EMA weights implementation.

PropertyValue
LicenseMIT
Training Steps560,001
Model TypeText-to-Image VAE
AuthorStabilityAI

What is sd-vae-ft-ema-original?

sd-vae-ft-ema-original is an improved autoencoder model specifically designed for Stable Diffusion implementations. This model represents a significant advancement in image reconstruction quality, trained on a carefully curated 1:1 ratio of LAION-Aesthetics and LAION-Humans datasets.

Implementation Details

The model was trained for 313,198 steps using EMA weights and employs a combination of L1 + LPIPS loss configuration. It maintains compatibility with existing models by only fine-tuning the decoder part, making it a perfect drop-in replacement for the original autoencoder.

  • Improved rFID scores (4.42 on COCO2017 dataset)
  • Enhanced PSNR of 23.8 ±3.9
  • Superior performance on LAION-Aesthetics with rFID of 1.77

Core Capabilities

  • Enhanced face reconstruction quality
  • Improved overall image fidelity
  • Better preservation of fine details
  • Seamless integration with existing Stable Diffusion pipelines

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its specialized training on human-centric datasets and improved reconstruction metrics, particularly for facial features. The use of EMA weights and specific loss configurations results in higher quality output compared to the original VAE.

Q: What are the recommended use cases?

The model is ideal for high-quality image generation tasks, particularly those involving human subjects. It's specifically designed for use with the original CompVis Stable Diffusion codebase and serves as an enhanced replacement for the standard autoencoder.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026