Taiyi-Stable-Diffusion-XL-3.5B

Taiyi-Stable-Diffusion-XL-3.5B

IDEA-CCNL

Bilingual text-to-image diffusion model with 3.5B parameters, optimized for both Chinese and English prompts. Features enhanced CLIP-based architecture and superior generation quality.

PropertyValue
LicenseApache 2.0
PaperarXiv:2401.14688
Language SupportEnglish, Chinese (Bilingual)
FrameworkDiffusers

What is Taiyi-Stable-Diffusion-XL-3.5B?

Taiyi-Stable-Diffusion-XL-3.5B is an advanced bilingual text-to-image generation model that builds upon the success of Stable Diffusion XL while specifically enhancing Chinese language capabilities. The model represents a significant advancement in bilingual AI image generation, offering superior performance in both English and Chinese text prompts.

Implementation Details

The model utilizes a three-stage training process, incorporating an enhanced CLIP text encoder with expanded vocabulary and position encoding. It's built on the Stable-Diffusion-XL architecture and trained using high-quality image-text pairs with detailed descriptive captions generated by vision-language models.

  • Multi-resolution and multi-aspect ratio training pipeline
  • Enhanced CLIP-based text encoder with bilingual capabilities
  • Memory-efficient training approach with contrastive loss function
  • Support for both Chinese and English text prompts

Core Capabilities

  • Superior bilingual text-to-image generation
  • High CLIP similarity scores (0.254 for English, 0.225 for Chinese)
  • Improved FID scores compared to previous models
  • Photorealistic image generation capabilities
  • Support for various artistic styles and compositions

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its exceptional bilingual capabilities, outperforming existing open-source alternatives in both English and Chinese text-to-image generation. It achieves this while maintaining high image quality and accurate prompt following.

Q: What are the recommended use cases?

The model is ideal for applications requiring high-quality image generation from both English and Chinese text prompts, including digital art creation, content generation, and visual design. It's particularly effective for photographic-style outputs and can be accelerated using LCM for faster generation.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026