sd-vae-ft-mse-original

Maintained By
stabilityai

sd-vae-ft-mse-original

PropertyValue
LicenseMIT
Training Steps840,001
Model TypeVariational Autoencoder (VAE)
AuthorStabilityAI

What is sd-vae-ft-mse-original?

The sd-vae-ft-mse-original is an improved variational autoencoder specifically designed for Stable Diffusion. This model represents a significant enhancement over the original VAE, featuring MSE-focused fine-tuning trained on a combination of LAION-Aesthetics and LAION-Humans datasets. It was trained for 840,001 steps, emphasizing MSE reconstruction with a modified loss function (MSE + 0.1 * LPIPS).

Implementation Details

The model was developed through a two-stage training process, first being fine-tuned from the original kl-f8 autoencoder using EMA weights, then further refined with MSE-focused training. The training utilized 16 A100 GPUs with a batch size of 12 per GPU, resulting in a total batch size of 192.

  • Trained on a 1:1 ratio of LAION-Aesthetics and LAION-Humans datasets
  • Implements EMA (Exponential Moving Average) weights
  • Achieves improved PSNR scores of 24.5 ±3.7 on COCO 2017
  • Features enhanced SSIM scores of 0.71 ±0.13

Core Capabilities

  • Produces smoother image outputs compared to previous versions
  • Improved face reconstruction quality
  • Better overall image reconstruction metrics
  • Drop-in replacement compatibility with existing models

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its MSE-focused training approach and significant improvements in reconstruction quality, particularly for human faces and general image fidelity. The combination of MSE and LPIPS loss functions results in notably smoother outputs while maintaining detail accuracy.

Q: What are the recommended use cases?

The model is particularly well-suited for applications requiring high-quality image reconstruction, especially those involving human subjects. It's designed as a drop-in replacement for the original Stable Diffusion VAE, making it ideal for enhancing existing Stable Diffusion pipelines.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.