CoreML Stable Diffusion 2 Base

Property	Value
Author	Apple
License	CreativeML Open RAIL++-M
Primary Paper	High-Resolution Image Synthesis With Latent Diffusion Models
Framework	Core ML

What is coreml-stable-diffusion-2-base?

This is an Apple Silicon-optimized version of Stable Diffusion v2, specifically converted to Core ML format for efficient deployment on Apple devices. The model maintains the powerful text-to-image generation capabilities of the original while leveraging Apple's hardware acceleration.

Implementation Details

The model is trained on a filtered subset of LAION-5B dataset, initially for 550k steps at 256x256 resolution, followed by 850k steps at 512x512 resolution. It uses a latent diffusion architecture with an autoencoder and UNet backbone, combined with OpenCLIP-ViT/H text encoding.

Offers both original and split_einsum attention variants
Supports both Swift and Python inference paths
Trained with conservative NSFW filtering (p_unsafe=0.1)
Uses v-objective for improved generation quality

Core Capabilities

High-quality text-to-image generation at 512x512 resolution
Optimized performance on Apple Silicon hardware
Filtered training data for safer content generation
Multiple deployment options for different use cases

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specific optimization for Apple Silicon hardware through Core ML conversion, offering efficient deployment while maintaining the quality of Stable Diffusion v2. It provides multiple variants for different use cases and performance requirements.

Q: What are the recommended use cases?

The model is intended for research purposes, creative tools, educational applications, and artistic processes. It's specifically designed for deployment on Apple devices where efficient, local processing is required. However, it should not be used for generating harmful, offensive, or inappropriate content.