wd-vit-tagger-v3

Maintained By
SmilingWolf

WD ViT Tagger v3

PropertyValue
Parameter Count94.6M
LicenseApache-2.0
Frameworktimm, ONNX, Safetensors
Tensor TypeF32

What is wd-vit-tagger-v3?

WD ViT Tagger v3 is an advanced Vision Transformer model designed for multi-label image classification, specifically optimized for anime and manga artwork tagging. Developed by SmilingWolf, this model represents a significant improvement over its predecessors, achieving an F1 score of 0.4402 at a threshold of 0.2614.

Implementation Details

The model was trained using the JAX-CV framework with TPU support from the TRC program. It processes Danbooru images and can identify ratings, characters, and general tags. The training dataset included images with IDs modulo 0000-0899, with validation performed on IDs modulo 0950-0999.

  • Trained on images with at least 10 general tags
  • Tags with fewer than 600 images were filtered out
  • Implements tag frequency-based loss scaling
  • Compatible with both timm and ONNX runtimes

Core Capabilities

  • Multi-label image classification
  • Supports batch inference in ONNX format
  • Handles ratings, character identification, and general tag assignment
  • Improved class imbalance handling through frequency-based loss scaling

Frequently Asked Questions

Q: What makes this model unique?

This model combines the power of Vision Transformers with specialized training for anime/manga artwork classification, featuring improved F1 scores and better handling of class imbalance compared to previous versions.

Q: What are the recommended use cases?

The model is ideal for automated tagging of anime and manga artwork, content organization, and image database management systems requiring detailed classification of artistic content.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.