Lumina-T2I

Alpha-VLLM

Lumina-T2I is a powerful text-to-image generation model using LargeDiT backbone with LLaMA-7B text encoder and SDXL VAE, supporting 1024x1024 resolution.

Property	Value
License	Apache 2.0
Model Type	Text-to-Image Generation
Architecture	LargeDiT + LLaMA-7B + SDXL VAE
Resolution	1024x1024

What is Lumina-T2I?

Lumina-T2I is an advanced text-to-image generation model that combines a LargeDiT backbone with LLaMA-7B text encoding capabilities and SDXL VAE for high-quality image generation. The model stands out for its ability to generate detailed images with minimal training costs while supporting various text encoders and parameter sizes.

Implementation Details

The model architecture consists of three main components: a Large-DiT backbone for image generation, LLaMA2-7B for text encoding, and stabilityai's fine-tuned SDXL VAE. It utilizes sophisticated transport and ODE solvers for optimal image generation, supporting multiple sampling methods and diffusion forms.

Supports multiple resolution formats including 1024x1024, 512x2048, and 2048x512
Configurable sampling steps (1-1000) and CFG scaling (1-20)
Multiple solver options including Euler and Dopri5/8
Advanced features like NTK scaling and proportional attention

Core Capabilities

High-quality image generation from text descriptions
Flexible resolution support with extrapolation capabilities
Customizable inference settings for different image styles
CLI and Web Demo interface options
Support for various transport paths and prediction models

Frequently Asked Questions

Q: What makes this model unique?

Lumina-T2I's uniqueness lies in its ability to achieve high-quality image generation with minimal training costs, utilizing a powerful combination of LargeDiT, LLaMA-7B, and SDXL VAE. The model offers extensive customization options and supports various resolution formats.

Q: What are the recommended use cases?

The model is ideal for high-quality image generation tasks requiring detailed control over the generation process. It's particularly suitable for applications needing flexible resolution support and those requiring both CLI and web-based interfaces.