LTX Video 0.9 VAE Finetuned

Property	Value
Author	spacepxl
Model URL	Hugging Face
License	Pending commercial permissive license from Lightricks

What is ltx-video-0.9-vae-finetune?

This is an enhanced version of the LTX Video 0.9 VAE model, specifically designed to address the common issue of checkerboard artifacts in the original model. The improvement focuses on two main components: a finetuned decoder and an optional finetuned encoder, while maintaining compatibility with the original latent space.

Implementation Details

The model employs a two-phase training approach: initial finetuning of the decoder while keeping the latent space intact, followed by limited encoder training with a frozen decoder. The architecture utilizes strided convolutions in the encoder and pixel shuffle upscaling in the decoder, which presents inherent challenges in completely eliminating artifacts.

Two model versions available: one with only finetuned decoder, another with both finetuned decoder and encoder
Maintains compatibility with the original diffusion model
Partially successful in reducing artifact strength

Core Capabilities

Reduced checkerboard artifacts compared to original model
Compatible with i2v (image-to-video) generation
Preserved latent space characteristics
Flexible deployment with two different versions

Frequently Asked Questions

Q: What makes this model unique?

This model specifically addresses the checkerboard artifact issue in the original LTX Video 0.9 VAE while maintaining compatibility with the original latent space, offering users the choice between two versions depending on their specific needs.

Q: What are the recommended use cases?

The model is ideal for video generation tasks where reduced artifacts are crucial, particularly in i2v applications. Users can choose between the version with only the finetuned decoder for minimal changes or the full version with both finetuned encoder and decoder for maximum artifact reduction.