CoreML Stable Diffusion v1.5

Property	Value
License	CreativeML OpenRAIL M
Architecture	Latent Diffusion Model
Author	Apple
Paper	High-Resolution Image Synthesis With Latent Diffusion Models (CVPR 2022)

What is coreml-stable-diffusion-v1-5?

This is an optimized version of Stable Diffusion v1.5 specifically designed for Apple Silicon hardware using Core ML. The model enables efficient text-to-image generation with two attention mechanism variants: original and split_einsum. It was trained on 595k steps at 512x512 resolution on the LAION-aesthetics v2 5+ dataset.

Implementation Details

The model combines an autoencoder with a diffusion model trained in latent space, using a ViT-L/14 text encoder for processing prompts. It offers four deployment variants: compiled and packages versions for both original and split_einsum attention mechanisms.

Supports both Swift and Python inference
Optimized for Apple Silicon processors
Uses relative downsampling factor of 8
Maps images to latents of shape H/f x W/f x 4

Core Capabilities

High-quality text-to-image generation
512x512 resolution output
Efficient processing on Apple hardware
Classifier-free guidance sampling
Built-in safety mechanisms

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specific optimization for Apple Silicon hardware through Core ML, offering efficient local processing while maintaining the quality of the original Stable Diffusion v1.5.

Q: What are the recommended use cases?

The model is intended for research purposes, including safe deployment testing, artistic applications, educational tools, and research on generative models. It should not be used for creating harmful or offensive content.