sd-vae-ft-ema
Property | Value |
---|---|
License | MIT |
Training Steps | 560,001 |
Framework | Diffusers |
Author | StabilityAI |
What is sd-vae-ft-ema?
sd-vae-ft-ema is an improved autoencoder model specifically designed for Stable Diffusion pipelines. It represents a significant enhancement over the original kl-f8 autoencoder, having been fine-tuned on a carefully curated 1:1 ratio of LAION-Aesthetics and LAION-Humans datasets. This model focuses on improving image reconstruction quality, particularly for human faces and general visuals.
Implementation Details
The model was trained for 313,198 steps using EMA (Exponential Moving Average) weights and maintains the original loss configuration of L1 + LPIPS. It achieves better reconstruction metrics compared to the original VAE, with improved rFID scores of 4.42 on COCO2017 and 1.77 on LAION-Aesthetics 5+.
- Drop-in replacement for existing Stable Diffusion autoencoders
- Optimized decoder-only fine-tuning approach
- Enhanced performance on face and human reconstructions
- Compatible with 🧨 diffusers library
Core Capabilities
- Superior image reconstruction quality with higher PSNR and SSIM metrics
- Improved face and human figure rendering
- Seamless integration with existing Stable Diffusion pipelines
- Better performance on both COCO2017 and LAION-Aesthetics datasets
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its specialized training on human-centric datasets and its use of EMA weights, resulting in better reconstruction quality while maintaining compatibility with existing Stable Diffusion implementations.
Q: What are the recommended use cases?
The model is particularly well-suited for applications requiring high-quality image reconstruction, especially those involving human subjects. It's ideal for Stable Diffusion pipelines where improved visual fidelity is crucial.