Small Stable Diffusion v0

Property	Value
License	OpenRAIL
Author	OFA-Sys
Primary Task	Text-to-Image Generation
Training Infrastructure	8 x A100-80GB GPUs

What is small-stable-diffusion-v0?

Small-stable-diffusion-v0 is an optimized version of the original Stable Diffusion model that achieves comparable image generation quality while being approximately half the size. The model features significant performance improvements, including a 4x speedup on GPU (using TensorRT) and a remarkable 12x speedup on CPU (using IntelOpenVINO), enabling image generation in just 5 seconds on compatible CPU hardware.

Implementation Details

The model underwent a three-stage training process, initialized from Stable Diffusion v1-4. It employs a unique architecture with layers_per_block=1, selecting the first layer of each block from the original model. The training process included pretraining and two stages of knowledge distillation using both v1-4 and v1-5 as teacher models.

Stage 1: 500,000 steps of pretraining the UNet
Stage 2: 400,000 steps of distillation using SD v1-4
Stage 3: 200,000 steps of advanced distillation using SD v1-5

Core Capabilities

Fast inference times: 5 seconds on CPU, significant GPU speedup
Comparable image quality to original Stable Diffusion
Efficient resource utilization with smaller model size
Support for various diffusion schedulers
Integration with popular frameworks like Gradio

Frequently Asked Questions

Q: What makes this model unique?

The model's primary distinction is its ability to maintain high-quality image generation while significantly reducing model size and improving inference speed through sophisticated knowledge distillation techniques.

Q: What are the recommended use cases?

The model is particularly well-suited for research purposes, educational tools, artistic applications, and scenarios where computational efficiency is crucial while maintaining good image quality. However, it should not be used for generating harmful, offensive, or misleading content.