vit-base-nsfw-detector

Property	Value
Parameter Count	86.1M
License	Apache 2.0
Base Model	google/vit-base-patch16-384
Accuracy	96.54%
AUC Score	99.48%

What is vit-base-nsfw-detector?

vit-base-nsfw-detector is a Vision Transformer (ViT) model specifically designed for detecting NSFW (Not Safe For Work) content in images. Built upon Google's vit-base-patch16-384 architecture, this model has been fine-tuned on approximately 25,000 diverse images including drawings and photographs. It employs a binary classification approach, categorizing images as either SFW (Safe For Work) or NSFW.

Implementation Details

The model is implemented using the transformer architecture, specifically adapted for image classification tasks. It processes images at a resolution of 384x384 pixels and was trained using the Adam optimizer with carefully tuned hyperparameters (learning rate: 3e-05, batch size: 32).

F32 tensor type for precise calculations
Transformers.js compatible for web deployment
Supports both local and remote image processing
Includes ONNX and Safetensors formats

Core Capabilities

Binary classification between SFW and NSFW content
High accuracy (96.54%) on natural images
Restrictive classification approach for better safety
Built-in support for various image formats
Easy integration with both Python and JavaScript environments

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its high accuracy (96.54%) and impressive AUC score (99.48%) in NSFW detection. It's specifically designed to be restrictive, classifying borderline content as NSFW for enhanced safety. Additionally, it supports multiple deployment options including web integration via Transformers.js.

Q: What are the recommended use cases?

The model is best suited for content moderation in production environments, social media platforms, and content filtering systems. However, it's important to note that it wasn't designed for generative AI images and performs better on natural photographs and traditional digital art.