CoreML Stable Diffusion 2 Base
Property | Value |
---|---|
Author | Apple |
License | CreativeML Open RAIL++-M |
Primary Paper | High-Resolution Image Synthesis With Latent Diffusion Models |
Framework | Core ML |
What is coreml-stable-diffusion-2-base?
This is an Apple Silicon-optimized version of Stable Diffusion v2, specifically converted to Core ML format for efficient deployment on Apple devices. The model maintains the powerful text-to-image generation capabilities of the original while leveraging Apple's hardware acceleration.
Implementation Details
The model is trained on a filtered subset of LAION-5B dataset, initially for 550k steps at 256x256 resolution, followed by 850k steps at 512x512 resolution. It uses a latent diffusion architecture with an autoencoder and UNet backbone, combined with OpenCLIP-ViT/H text encoding.
- Offers both original and split_einsum attention variants
- Supports both Swift and Python inference paths
- Trained with conservative NSFW filtering (p_unsafe=0.1)
- Uses v-objective for improved generation quality
Core Capabilities
- High-quality text-to-image generation at 512x512 resolution
- Optimized performance on Apple Silicon hardware
- Filtered training data for safer content generation
- Multiple deployment options for different use cases
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specific optimization for Apple Silicon hardware through Core ML conversion, offering efficient deployment while maintaining the quality of Stable Diffusion v2. It provides multiple variants for different use cases and performance requirements.
Q: What are the recommended use cases?
The model is intended for research purposes, creative tools, educational applications, and artistic processes. It's specifically designed for deployment on Apple devices where efficient, local processing is required. However, it should not be used for generating harmful, offensive, or inappropriate content.