sd-vae-ft-mse

Property	Value
License	MIT
Author	StabilityAI
Training Steps	840,001
Framework	Diffusers

What is sd-vae-ft-mse?

The sd-vae-ft-mse is an improved variational autoencoder (VAE) specifically designed for Stable Diffusion models. It represents a significant advancement over the original VAE, having been fine-tuned on a combination of LAION-Aesthetics and LAION-Humans datasets with an emphasis on MSE (Mean Squared Error) loss for superior image reconstruction quality.

Implementation Details

This model was developed through a two-stage training process, first being trained for 313,198 steps as ft-EMA, then continued for another 280,000 steps with modified loss functions focusing on MSE reconstruction (MSE + 0.1 * LPIPS). The training utilized 16 A100 GPUs with a batch size of 12 per GPU, totaling 192 samples per batch.

Trained on a 1:1 ratio of LAION-Aesthetics and LAION-Humans datasets
Implements EMA (Exponential Moving Average) weights
Focuses on smoother output generation
Compatible as a drop-in replacement for existing autoencoders

Core Capabilities

Improved PSNR scores (24.5 ±3.7 on COCO2017)
Enhanced SSIM metrics (0.71 ±0.13)
Better face reconstruction quality
Smoother overall image outputs

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized fine-tuning approach that emphasizes MSE loss, resulting in smoother and more accurate image reconstructions, particularly for human faces and detailed imagery.

Q: What are the recommended use cases?

The model is best suited for Stable Diffusion pipelines where high-quality image reconstruction is crucial, especially when working with human subjects or detailed scenes requiring precise detail preservation.

sd-vae-ft-mse

sd-vae-ft-mse

What is sd-vae-ft-mse?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models