EQ-VAE-EMA
Property | Value |
---|---|
Paper | arXiv:2502.09509 |
Author | zelaki |
Model Type | Regularized VAE |
Performance Metrics | FID: 0.552, PSNR: 26.158, LPIPS: 0.133, SSIM: 0.725 |
What is eq-vae-ema?
EQ-VAE-EMA is an enhanced version of SD-VAE that introduces equivariance regularization in the latent space. The model has been finetuned for 44 epochs on ImageNet with EMA (Exponential Moving Average) weights, specifically designed to improve generative image modeling through scale and rotation invariance.
Implementation Details
The model implements a novel approach to autoencoder architecture by enforcing equivariance under scaling and rotation transformations in the latent space. It builds upon the foundation of SD-VAE while incorporating specialized regularization techniques to enhance image generation quality.
- Regularized latent space for improved transformation handling
- EMA weight implementation for stable training
- 44-epoch finetuning on ImageNet dataset
- Impressive reconstruction metrics on ImageNet Validation Set
Core Capabilities
- High-quality image reconstruction with FID score of 0.552
- Strong perceptual similarity maintenance (LPIPS: 0.133)
- Robust structural similarity preservation (SSIM: 0.725)
- Efficient handling of scale and rotation transformations
Frequently Asked Questions
Q: What makes this model unique?
EQ-VAE-EMA stands out through its specialized equivariance regularization technique, which helps maintain consistent image representations across different scales and rotations. This approach leads to more robust and reliable image generation compared to traditional VAE models.
Q: What are the recommended use cases?
The model is particularly well-suited for applications requiring high-quality image generation and reconstruction, especially when dealing with variations in scale and rotation. It's ideal for tasks such as image manipulation, generation, and transformation where maintaining structural integrity is crucial.