sd-vae-ft-ema-original
Property | Value |
---|---|
License | MIT |
Training Steps | 560,001 |
Model Type | Text-to-Image VAE |
Author | StabilityAI |
What is sd-vae-ft-ema-original?
sd-vae-ft-ema-original is an improved autoencoder model specifically designed for Stable Diffusion implementations. This model represents a significant advancement in image reconstruction quality, trained on a carefully curated 1:1 ratio of LAION-Aesthetics and LAION-Humans datasets.
Implementation Details
The model was trained for 313,198 steps using EMA weights and employs a combination of L1 + LPIPS loss configuration. It maintains compatibility with existing models by only fine-tuning the decoder part, making it a perfect drop-in replacement for the original autoencoder.
- Improved rFID scores (4.42 on COCO2017 dataset)
- Enhanced PSNR of 23.8 ±3.9
- Superior performance on LAION-Aesthetics with rFID of 1.77
Core Capabilities
- Enhanced face reconstruction quality
- Improved overall image fidelity
- Better preservation of fine details
- Seamless integration with existing Stable Diffusion pipelines
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its specialized training on human-centric datasets and improved reconstruction metrics, particularly for facial features. The use of EMA weights and specific loss configurations results in higher quality output compared to the original VAE.
Q: What are the recommended use cases?
The model is ideal for high-quality image generation tasks, particularly those involving human subjects. It's specifically designed for use with the original CompVis Stable Diffusion codebase and serves as an enhanced replacement for the standard autoencoder.