Small Stable Diffusion v0
Property | Value |
---|---|
License | OpenRAIL |
Author | OFA-Sys |
Primary Task | Text-to-Image Generation |
Training Infrastructure | 8 x A100-80GB GPUs |
What is small-stable-diffusion-v0?
Small-stable-diffusion-v0 is an optimized version of the original Stable Diffusion model that achieves comparable image generation quality while being approximately half the size. The model features significant performance improvements, including a 4x speedup on GPU (using TensorRT) and a remarkable 12x speedup on CPU (using IntelOpenVINO), enabling image generation in just 5 seconds on compatible CPU hardware.
Implementation Details
The model underwent a three-stage training process, initialized from Stable Diffusion v1-4. It employs a unique architecture with layers_per_block=1, selecting the first layer of each block from the original model. The training process included pretraining and two stages of knowledge distillation using both v1-4 and v1-5 as teacher models.
- Stage 1: 500,000 steps of pretraining the UNet
- Stage 2: 400,000 steps of distillation using SD v1-4
- Stage 3: 200,000 steps of advanced distillation using SD v1-5
Core Capabilities
- Fast inference times: 5 seconds on CPU, significant GPU speedup
- Comparable image quality to original Stable Diffusion
- Efficient resource utilization with smaller model size
- Support for various diffusion schedulers
- Integration with popular frameworks like Gradio
Frequently Asked Questions
Q: What makes this model unique?
The model's primary distinction is its ability to maintain high-quality image generation while significantly reducing model size and improving inference speed through sophisticated knowledge distillation techniques.
Q: What are the recommended use cases?
The model is particularly well-suited for research purposes, educational tools, artistic applications, and scenarios where computational efficiency is crucial while maintaining good image quality. However, it should not be used for generating harmful, offensive, or misleading content.