sd-vae-ft-ema

Property	Value
License	MIT
Training Steps	560,001
Framework	Diffusers
Author	StabilityAI

What is sd-vae-ft-ema?

sd-vae-ft-ema is an improved autoencoder model specifically designed for Stable Diffusion pipelines. It represents a significant enhancement over the original kl-f8 autoencoder, having been fine-tuned on a carefully curated 1:1 ratio of LAION-Aesthetics and LAION-Humans datasets. This model focuses on improving image reconstruction quality, particularly for human faces and general visuals.

Implementation Details

The model was trained for 313,198 steps using EMA (Exponential Moving Average) weights and maintains the original loss configuration of L1 + LPIPS. It achieves better reconstruction metrics compared to the original VAE, with improved rFID scores of 4.42 on COCO2017 and 1.77 on LAION-Aesthetics 5+.

Drop-in replacement for existing Stable Diffusion autoencoders
Optimized decoder-only fine-tuning approach
Enhanced performance on face and human reconstructions
Compatible with 🧨 diffusers library

Core Capabilities

Superior image reconstruction quality with higher PSNR and SSIM metrics
Improved face and human figure rendering
Seamless integration with existing Stable Diffusion pipelines
Better performance on both COCO2017 and LAION-Aesthetics datasets

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its specialized training on human-centric datasets and its use of EMA weights, resulting in better reconstruction quality while maintaining compatibility with existing Stable Diffusion implementations.

Q: What are the recommended use cases?

The model is particularly well-suited for applications requiring high-quality image reconstruction, especially those involving human subjects. It's ideal for Stable Diffusion pipelines where improved visual fidelity is crucial.

sd-vae-ft-ema

sd-vae-ft-ema

What is sd-vae-ft-ema?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models