coreml-stable-diffusion-v1-5

Maintained By
apple

CoreML Stable Diffusion v1.5

PropertyValue
LicenseCreativeML OpenRAIL M
ArchitectureLatent Diffusion Model
AuthorApple
PaperHigh-Resolution Image Synthesis With Latent Diffusion Models (CVPR 2022)

What is coreml-stable-diffusion-v1-5?

This is an optimized version of Stable Diffusion v1.5 specifically designed for Apple Silicon hardware using Core ML. The model enables efficient text-to-image generation with two attention mechanism variants: original and split_einsum. It was trained on 595k steps at 512x512 resolution on the LAION-aesthetics v2 5+ dataset.

Implementation Details

The model combines an autoencoder with a diffusion model trained in latent space, using a ViT-L/14 text encoder for processing prompts. It offers four deployment variants: compiled and packages versions for both original and split_einsum attention mechanisms.

  • Supports both Swift and Python inference
  • Optimized for Apple Silicon processors
  • Uses relative downsampling factor of 8
  • Maps images to latents of shape H/f x W/f x 4

Core Capabilities

  • High-quality text-to-image generation
  • 512x512 resolution output
  • Efficient processing on Apple hardware
  • Classifier-free guidance sampling
  • Built-in safety mechanisms

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specific optimization for Apple Silicon hardware through Core ML, offering efficient local processing while maintaining the quality of the original Stable Diffusion v1.5.

Q: What are the recommended use cases?

The model is intended for research purposes, including safe deployment testing, artistic applications, educational tools, and research on generative models. It should not be used for creating harmful or offensive content.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.