sotediffusion-v2

Disty0

SoteDiffusion V2 is an anime-focused fine-tune of Würstchen V3/Stable Cascade, trained on 12M image-text pairs with full FP32 and MAE Loss on 8xH100 GPUs.

Property	Value
Model Type	Anime Image Generation
Base Architecture	Würstchen V3 / Stable Cascade
Training Data	12M text-image pairs
License	Fair AI Public License 1.0-SD
Model URL	https://huggingface.co/Disty0/sotediffusion-v2

What is sotediffusion-v2?

SoteDiffusion V2 is a specialized anime-focused image generation model that builds upon the Würstchen V3/Stable Cascade architecture. The model was trained using full FP32 precision and MAE Loss on a robust dataset of 12 million text-image pairs, utilizing 8 H100 80GB SXM5 GPUs. This implementation represents a significant advancement in anime-style image generation, incorporating both WD tags and natural language captions.

Implementation Details

The model employs a three-stage architecture (Stage A, B, and C) with specific optimizations for anime image generation. It uses sophisticated prompt encoding techniques and supports multiple deployment platforms including ComfyUI, SD.Next, and Diffusers.

Stage C uses DPMPP 2M sampler with 28 steps and 6.0 CFG
Stage B employs LCM with Exponential scheduler, 14 steps and 1.0 CFG
Supports various resolutions (multiples of 128)
Implements advanced aesthetic and quality tag systems

Core Capabilities

High-quality anime image generation with detailed character features
Support for long prompts with sophisticated encoding
Multiple quality levels and aesthetic scoring system
Specialized handling of anime-specific attributes and style elements

Frequently Asked Questions

Q: What makes this model unique?

The model combines advanced Würstchen V3 architecture with specialized anime training, using a sophisticated tag system and full FP32 precision training. It includes a comprehensive aesthetic scoring system and quality classification.

Q: What are the recommended use cases?

The model excels at generating high-quality anime illustrations, particularly character-focused images. It's best suited for anime-style artwork and illustrations, though it may require "realistic" in negative prompts to avoid realistic renderings.