coreml-stable-diffusion-2-1-base

Maintained By
coreml-community

CoreML Stable Diffusion 2.1 Base

PropertyValue
LicenseCreativeML OpenRAIL-M
Original PaperHigh-Resolution Image Synthesis With Latent Diffusion Models (CVPR 2022)
Platform SupportApple Silicon Devices
Primary UseText-to-Image Generation

What is coreml-stable-diffusion-2-1-base?

This is a Core ML-optimized version of the Stable Diffusion 2.1 base model, specifically converted for efficient performance on Apple Silicon devices. It builds upon the stable-diffusion-2-base model with 220k additional training steps and enhanced safety filters. The model comes in two variants: a split_einsum version compatible with all compute units including Neural Engine, and an original version for CPU & GPU operations.

Implementation Details

The model utilizes a Latent Diffusion architecture with a fixed, pretrained OpenCLIP-ViT/H text encoder. It processes images through an autoencoder with a relative downsampling factor of 8, mapping images from H x W x 3 to latents of H/f x W/f x 4. The model integrates seamlessly with applications like Mochi Diffusion for practical image generation tasks.

  • Optimized performance on Apple Silicon through Core ML conversion
  • Enhanced safety filters with punsafe=0.98 threshold
  • Compatible with Neural Engine through split_einsum implementation
  • Supports 512x512 resolution image generation

Core Capabilities

  • High-quality text-to-image generation
  • Efficient processing on Apple devices
  • Research and artistic applications
  • Educational and creative tool integration

Frequently Asked Questions

Q: What makes this model unique?

This model's unique value lies in its optimization for Apple Silicon devices through Core ML conversion, offering both split_einsum and original versions for different compute unit compatibility while maintaining the high-quality generation capabilities of Stable Diffusion 2.1.

Q: What are the recommended use cases?

The model is recommended for research purposes, artistic creation, educational tools, and design applications. It's particularly suitable for users working on Apple Silicon devices who need efficient, local text-to-image generation capabilities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.