sd-vae-ft-ema-original

Maintained By
stabilityai

sd-vae-ft-ema-original

PropertyValue
LicenseMIT
Training Steps560,001
Model TypeText-to-Image VAE
AuthorStabilityAI

What is sd-vae-ft-ema-original?

sd-vae-ft-ema-original is an improved autoencoder model specifically designed for Stable Diffusion implementations. This model represents a significant advancement in image reconstruction quality, trained on a carefully curated 1:1 ratio of LAION-Aesthetics and LAION-Humans datasets.

Implementation Details

The model was trained for 313,198 steps using EMA weights and employs a combination of L1 + LPIPS loss configuration. It maintains compatibility with existing models by only fine-tuning the decoder part, making it a perfect drop-in replacement for the original autoencoder.

  • Improved rFID scores (4.42 on COCO2017 dataset)
  • Enhanced PSNR of 23.8 ±3.9
  • Superior performance on LAION-Aesthetics with rFID of 1.77

Core Capabilities

  • Enhanced face reconstruction quality
  • Improved overall image fidelity
  • Better preservation of fine details
  • Seamless integration with existing Stable Diffusion pipelines

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its specialized training on human-centric datasets and improved reconstruction metrics, particularly for facial features. The use of EMA weights and specific loss configurations results in higher quality output compared to the original VAE.

Q: What are the recommended use cases?

The model is ideal for high-quality image generation tasks, particularly those involving human subjects. It's specifically designed for use with the original CompVis Stable Diffusion codebase and serves as an enhanced replacement for the standard autoencoder.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.