sd-vae-ft-ema

Maintained By
stabilityai

sd-vae-ft-ema

PropertyValue
LicenseMIT
Training Steps560,001
FrameworkDiffusers
AuthorStabilityAI

What is sd-vae-ft-ema?

sd-vae-ft-ema is an improved autoencoder model specifically designed for Stable Diffusion pipelines. It represents a significant enhancement over the original kl-f8 autoencoder, having been fine-tuned on a carefully curated 1:1 ratio of LAION-Aesthetics and LAION-Humans datasets. This model focuses on improving image reconstruction quality, particularly for human faces and general visuals.

Implementation Details

The model was trained for 313,198 steps using EMA (Exponential Moving Average) weights and maintains the original loss configuration of L1 + LPIPS. It achieves better reconstruction metrics compared to the original VAE, with improved rFID scores of 4.42 on COCO2017 and 1.77 on LAION-Aesthetics 5+.

  • Drop-in replacement for existing Stable Diffusion autoencoders
  • Optimized decoder-only fine-tuning approach
  • Enhanced performance on face and human reconstructions
  • Compatible with 🧨 diffusers library

Core Capabilities

  • Superior image reconstruction quality with higher PSNR and SSIM metrics
  • Improved face and human figure rendering
  • Seamless integration with existing Stable Diffusion pipelines
  • Better performance on both COCO2017 and LAION-Aesthetics datasets

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its specialized training on human-centric datasets and its use of EMA weights, resulting in better reconstruction quality while maintaining compatibility with existing Stable Diffusion implementations.

Q: What are the recommended use cases?

The model is particularly well-suited for applications requiring high-quality image reconstruction, especially those involving human subjects. It's ideal for Stable Diffusion pipelines where improved visual fidelity is crucial.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.