SoteDiffusion V2
Property | Value |
---|---|
Model Type | Anime Image Generation |
Base Architecture | Würstchen V3 / Stable Cascade |
Training Data | 12M text-image pairs |
License | Fair AI Public License 1.0-SD |
Model URL | https://huggingface.co/Disty0/sotediffusion-v2 |
What is sotediffusion-v2?
SoteDiffusion V2 is a specialized anime-focused image generation model that builds upon the Würstchen V3/Stable Cascade architecture. The model was trained using full FP32 precision and MAE Loss on a robust dataset of 12 million text-image pairs, utilizing 8 H100 80GB SXM5 GPUs. This implementation represents a significant advancement in anime-style image generation, incorporating both WD tags and natural language captions.
Implementation Details
The model employs a three-stage architecture (Stage A, B, and C) with specific optimizations for anime image generation. It uses sophisticated prompt encoding techniques and supports multiple deployment platforms including ComfyUI, SD.Next, and Diffusers.
- Stage C uses DPMPP 2M sampler with 28 steps and 6.0 CFG
- Stage B employs LCM with Exponential scheduler, 14 steps and 1.0 CFG
- Supports various resolutions (multiples of 128)
- Implements advanced aesthetic and quality tag systems
Core Capabilities
- High-quality anime image generation with detailed character features
- Support for long prompts with sophisticated encoding
- Multiple quality levels and aesthetic scoring system
- Specialized handling of anime-specific attributes and style elements
Frequently Asked Questions
Q: What makes this model unique?
The model combines advanced Würstchen V3 architecture with specialized anime training, using a sophisticated tag system and full FP32 precision training. It includes a comprehensive aesthetic scoring system and quality classification.
Q: What are the recommended use cases?
The model excels at generating high-quality anime illustrations, particularly character-focused images. It's best suited for anime-style artwork and illustrations, though it may require "realistic" in negative prompts to avoid realistic renderings.