sd-vae-ft-ema-original

Property	Value
License	MIT
Training Steps	560,001
Model Type	Text-to-Image VAE
Author	StabilityAI

What is sd-vae-ft-ema-original?

sd-vae-ft-ema-original is an improved autoencoder model specifically designed for Stable Diffusion implementations. This model represents a significant advancement in image reconstruction quality, trained on a carefully curated 1:1 ratio of LAION-Aesthetics and LAION-Humans datasets.

Implementation Details

The model was trained for 313,198 steps using EMA weights and employs a combination of L1 + LPIPS loss configuration. It maintains compatibility with existing models by only fine-tuning the decoder part, making it a perfect drop-in replacement for the original autoencoder.

Improved rFID scores (4.42 on COCO2017 dataset)
Enhanced PSNR of 23.8 ±3.9
Superior performance on LAION-Aesthetics with rFID of 1.77

Core Capabilities

Enhanced face reconstruction quality
Improved overall image fidelity
Better preservation of fine details
Seamless integration with existing Stable Diffusion pipelines

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its specialized training on human-centric datasets and improved reconstruction metrics, particularly for facial features. The use of EMA weights and specific loss configurations results in higher quality output compared to the original VAE.

Q: What are the recommended use cases?

The model is ideal for high-quality image generation tasks, particularly those involving human subjects. It's specifically designed for use with the original CompVis Stable Diffusion codebase and serves as an enhanced replacement for the standard autoencoder.