sd-vae-ft-mse

Maintained By
stabilityai

sd-vae-ft-mse

PropertyValue
LicenseMIT
AuthorStabilityAI
Training Steps840,001
FrameworkDiffusers

What is sd-vae-ft-mse?

The sd-vae-ft-mse is an improved variational autoencoder (VAE) specifically designed for Stable Diffusion models. It represents a significant advancement over the original VAE, having been fine-tuned on a combination of LAION-Aesthetics and LAION-Humans datasets with an emphasis on MSE (Mean Squared Error) loss for superior image reconstruction quality.

Implementation Details

This model was developed through a two-stage training process, first being trained for 313,198 steps as ft-EMA, then continued for another 280,000 steps with modified loss functions focusing on MSE reconstruction (MSE + 0.1 * LPIPS). The training utilized 16 A100 GPUs with a batch size of 12 per GPU, totaling 192 samples per batch.

  • Trained on a 1:1 ratio of LAION-Aesthetics and LAION-Humans datasets
  • Implements EMA (Exponential Moving Average) weights
  • Focuses on smoother output generation
  • Compatible as a drop-in replacement for existing autoencoders

Core Capabilities

  • Improved PSNR scores (24.5 ±3.7 on COCO2017)
  • Enhanced SSIM metrics (0.71 ±0.13)
  • Better face reconstruction quality
  • Smoother overall image outputs

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized fine-tuning approach that emphasizes MSE loss, resulting in smoother and more accurate image reconstructions, particularly for human faces and detailed imagery.

Q: What are the recommended use cases?

The model is best suited for Stable Diffusion pipelines where high-quality image reconstruction is crucial, especially when working with human subjects or detailed scenes requiring precise detail preservation.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.