cafe-instagram-sd-1-5-v6

cafeai

A Stable Diffusion 1.5-based model trained on 1.2M Instagram images, optimized for Japanese idol/fashion photos with BLIP captioning and booru tags.

Property	Value
License	AGPL-3.0
Base Model	runwayml/stable-diffusion-v1-5
Training Data	1.2M Instagram images
Author	cafeai

What is cafe-instagram-sd-1-5-v6?

cafe-instagram-sd-1-5-v6 is a specialized Stable Diffusion model fine-tuned for generating Instagram-style Japanese idol and fashion photography. Trained on runwayml/stable-diffusion-v1-5 for approximately 1.6 epochs using 1.2M curated Instagram images, this model leverages BLIP natural language descriptions and booru tags for enhanced image generation capabilities.

Implementation Details

The model employs a sophisticated training approach using various aspect ratios with a base resolution of 768x768 and utilizes the penultimate CLIP layer. For optimal results, it's recommended to use a clip skip of 2 and maintain a resolution of 768x768 or higher.

Trained on diverse aspect ratios with 768x768 base resolution
Implements BLIP captioning and booru tag assistance
Incorporates Instagram hashtags in training data
Uses penultimate CLIP layer for improved performance

Core Capabilities

Generation of photorealistic Japanese idol and fashion photography
Support for various Instagram-style aesthetics
Enhanced performance with specific prompt structures
Specialized in generating realistic portraits and fashion shots

Frequently Asked Questions

Q: What makes this model unique?

This model specializes in Instagram-style Japanese idol and fashion photography, utilizing a combination of BLIP descriptions and booru tags for enhanced generation capabilities. Its training on authentic Instagram content makes it particularly effective for creating realistic social media-style images.

Q: What are the recommended use cases?

The model is best suited for generating fashion photography, idol portraits, and Instagram-style content. It's recommended to use the model with a clip skip of 2 and resolution of 768x768 or higher. For optimal results, mixing with other models may enhance performance due to its undertrained nature.