Text2LIVE
Property | Value |
---|---|
Paper | View on arXiv |
Framework | PyTorch (>=1.10.0) |
Author | Antiraedus |
What is test-text2Live?
Text2LIVE is a groundbreaking framework for text-driven editing of real-world images and videos without requiring pre-training or user-provided edit masks. The model specializes in zero-shot appearance manipulation, allowing users to edit object textures or add visual effects through simple text prompts while maintaining high fidelity to the original input.
Implementation Details
The model operates by generating an edit layer (color+opacity) that composites over the original input, rather than directly generating the edited output. It leverages CLIP for establishing losses and trains using an internal dataset extracted from a single input. The implementation requires significant GPU resources (recommended: Tesla V100 32GB) and currently doesn't support mixed precision training.
- Zero-shot editing capability without pre-trained generators
- Internal learning approach using CLIP-driven losses
- Supports both image and video editing
- Utilizes Neural Layered Atlases for temporal consistency in videos
Core Capabilities
- Localized and global texture editing of existing objects
- Addition of semi-transparent effects (smoke, fire, snow)
- High-resolution image and video processing
- Semantic-aware editing preserving original spatial layout
Frequently Asked Questions
Q: What makes this model unique?
Text2LIVE's uniqueness lies in its ability to perform zero-shot edits without pre-training or masks, while maintaining high fidelity to the original input through its novel layered approach and text-driven losses.
Q: What are the recommended use cases?
The model is ideal for changing object textures and adding semi-transparent effects to images and videos. It's not designed for adding new objects or significantly altering the original spatial layout. Common applications include adding weather effects, changing material appearances, and creating atmospheric modifications.