WD ViT-Large Tagger v3
Property | Value |
---|---|
Parameter Count | 315M |
Model Type | Vision Transformer |
License | Apache 2.0 |
Tensor Type | F32 |
Framework | timm, ONNX, Safetensors |
What is wd-vit-large-tagger-v3?
WD ViT-Large Tagger v3 is a state-of-the-art image tagging model built on the Vision Transformer architecture. Trained on the extensive Danbooru dataset, it specializes in identifying and tagging anime and manga-style images with high precision. The model represents a significant upgrade from its predecessors, featuring enhanced compatibility with the timm library and improved batch processing capabilities.
Implementation Details
The model was trained using the JAX-CV framework with TPU support from the TRC program. It processes images from the Danbooru dataset up to ID 7220105, utilizing a specific training-validation split strategy. The training set includes images with IDs modulo 0000-0899, while validation uses IDs modulo 0950-0999.
- Achieves F1 score of 0.4674 at threshold 0.2606
- Supports batch inference in ONNX format
- Requires onnxruntime >= 1.17.0
- Compatible with timm library for easy integration
Core Capabilities
- Comprehensive tagging support for ratings, characters, and general tags
- Filtered training on high-quality data (10+ general tags per image)
- Tag coverage for items with 600+ image examples
- Updated tag database through February 2024
Frequently Asked Questions
Q: What makes this model unique?
This model combines the power of Vision Transformers with extensive anime/manga domain knowledge, offering improved batch processing and broader framework compatibility compared to previous versions. Its training on a carefully curated dataset ensures high-quality tag predictions.
Q: What are the recommended use cases?
The model is ideal for automated tagging of anime and manga-style images, content organization, and database management. It's particularly useful for large-scale image classification tasks where accurate tag prediction is crucial.