Kandinsky-3

Property	Value
License	Apache 2.0
Pipeline Type	Text-to-Image
Architecture Size	~11.9B parameters total
Components	Text Encoder (8.6B), U-Net (3B), MoVQ (267M)

What is kandinsky-3?

Kandinsky-3 represents a significant evolution in the text-to-image diffusion model space, building upon its predecessors in the Kandinsky2-x family. This open-source model is specially designed with enhanced capabilities for generating images related to Russian culture, while also maintaining exceptional general-purpose image generation abilities. The model architecture comprises three main components: a massive 8.6B parameter Flan-UL2 text encoder, a 3B parameter Latent Diffusion U-Net, and a 267M parameter MoVQ encoder/decoder.

Implementation Details

The model is implemented using the Diffusers framework and offers both base and inpainting variants. The base model underwent extensive training for 2M steps on 400 A100 GPUs, while the inpainting version was fine-tuned for an additional 250k steps on 300 A100 GPUs.

Sophisticated text understanding through enhanced text encoder
Improved visual quality via larger Diffusion U-Net
Specialized Russian cultural content generation
Support for both text-to-image and image-to-image pipelines

Core Capabilities

High-quality image generation from text descriptions
Image-to-image transformation with controllable strength
Inpainting functionality
Efficient CPU offloading support
FP16 precision support for optimized performance

Frequently Asked Questions

Q: What makes this model unique?

Kandinsky-3's uniqueness lies in its specialized ability to generate Russian cultural content while maintaining state-of-the-art general image generation capabilities. Its massive model size and three-component architecture enable superior text understanding and image quality.

Q: What are the recommended use cases?

The model excels in creative image generation tasks, particularly those involving Russian cultural elements. It's suitable for artistic projects, content creation, and professional design work requiring high-quality image generation or transformation.

kandinsky-3