Text2LIVE

Property	Value
Paper	View on arXiv
Framework	PyTorch (>=1.10.0)
Author	Antiraedus

What is test-text2Live?

Text2LIVE is a groundbreaking framework for text-driven editing of real-world images and videos without requiring pre-training or user-provided edit masks. The model specializes in zero-shot appearance manipulation, allowing users to edit object textures or add visual effects through simple text prompts while maintaining high fidelity to the original input.

Implementation Details

The model operates by generating an edit layer (color+opacity) that composites over the original input, rather than directly generating the edited output. It leverages CLIP for establishing losses and trains using an internal dataset extracted from a single input. The implementation requires significant GPU resources (recommended: Tesla V100 32GB) and currently doesn't support mixed precision training.

Zero-shot editing capability without pre-trained generators
Internal learning approach using CLIP-driven losses
Supports both image and video editing
Utilizes Neural Layered Atlases for temporal consistency in videos

Core Capabilities

Localized and global texture editing of existing objects
Addition of semi-transparent effects (smoke, fire, snow)
High-resolution image and video processing
Semantic-aware editing preserving original spatial layout

Frequently Asked Questions

Q: What makes this model unique?

Text2LIVE's uniqueness lies in its ability to perform zero-shot edits without pre-training or masks, while maintaining high fidelity to the original input through its novel layered approach and text-driven losses.

Q: What are the recommended use cases?

The model is ideal for changing object textures and adding semi-transparent effects to images and videos. It's not designed for adding new objects or significantly altering the original spatial layout. Common applications include adding weather effects, changing material appearances, and creating atmospheric modifications.

test-text2Live