CoreML Stable Diffusion 2.1 Base
Property | Value |
---|---|
License | CreativeML OpenRAIL-M |
Original Paper | High-Resolution Image Synthesis With Latent Diffusion Models (CVPR 2022) |
Platform Support | Apple Silicon Devices |
Primary Use | Text-to-Image Generation |
What is coreml-stable-diffusion-2-1-base?
This is a Core ML-optimized version of the Stable Diffusion 2.1 base model, specifically converted for efficient performance on Apple Silicon devices. It builds upon the stable-diffusion-2-base model with 220k additional training steps and enhanced safety filters. The model comes in two variants: a split_einsum version compatible with all compute units including Neural Engine, and an original version for CPU & GPU operations.
Implementation Details
The model utilizes a Latent Diffusion architecture with a fixed, pretrained OpenCLIP-ViT/H text encoder. It processes images through an autoencoder with a relative downsampling factor of 8, mapping images from H x W x 3 to latents of H/f x W/f x 4. The model integrates seamlessly with applications like Mochi Diffusion for practical image generation tasks.
- Optimized performance on Apple Silicon through Core ML conversion
- Enhanced safety filters with punsafe=0.98 threshold
- Compatible with Neural Engine through split_einsum implementation
- Supports 512x512 resolution image generation
Core Capabilities
- High-quality text-to-image generation
- Efficient processing on Apple devices
- Research and artistic applications
- Educational and creative tool integration
Frequently Asked Questions
Q: What makes this model unique?
This model's unique value lies in its optimization for Apple Silicon devices through Core ML conversion, offering both split_einsum and original versions for different compute unit compatibility while maintaining the high-quality generation capabilities of Stable Diffusion 2.1.
Q: What are the recommended use cases?
The model is recommended for research purposes, artistic creation, educational tools, and design applications. It's particularly suitable for users working on Apple Silicon devices who need efficient, local text-to-image generation capabilities.