EliGen

Maintained By
modelscope

EliGen

PropertyValue
AuthorModelScope
PaperEliGen: Entity-Level Controlled Image Generation with Regional Attention
RepositoryDiffSynth-Studio
Model AccessAvailable on ModelScope and HuggingFace

What is EliGen?

EliGen is a groundbreaking text-to-image generation model that introduces precise entity-level control capabilities. It leverages a novel regional attention mechanism within the DiT framework to enable fine-grained control over specific image regions. The model excels in tasks such as entity-level controlled image generation, image inpainting, and can be integrated with existing community models like IP-Adapter and In-Context LoRA.

Implementation Details

The model implements a sophisticated regional attention mechanism that transforms positional information of entities into attention masks, ensuring precise control over designated regions. It was trained using a specially curated entity-annotated dataset created using DiffusionDB and Qwen2-VL 72B for entity identification. The training process utilizes LoRA (Low-Rank Adaptation) and DeepSpeed for efficient fine-tuning.

  • Regional attention mechanism for precise entity control
  • Entity-annotated dataset with local prompts and bounding boxes
  • Integration capabilities with IP-Adapter and In-Context LoRA
  • Support for multiple generation modes including inpainting

Core Capabilities

  • Entity-level controlled image generation with precise positioning
  • Image inpainting with entity-level modifications
  • Styled entity control through IP-Adapter integration
  • Entity transfer functionality
  • Interactive UI for easy model interaction

Frequently Asked Questions

Q: What makes this model unique?

EliGen's distinctive feature is its ability to provide precise entity-level control during image generation through its regional attention mechanism, allowing for specific placement and modification of individual elements within generated images.

Q: What are the recommended use cases?

The model is ideal for applications requiring precise control over image generation, including: detailed artistic compositions, specific layout designs, targeted image inpainting, style transfer projects, and scenarios requiring fine-grained control over individual image elements.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.