LlavaGuard-v1.2-0.5B-OV-hf

Property	Value
Parameter Count	894M
Model Type	Vision-Language Model
Tensor Type	FP16
Paper	LlavaGuard-Arxiv
Base Model	llava-onevision-qwen2-0.5b-ov

What is LlavaGuard-v1.2-0.5B-OV-hf?

LlavaGuard is a specialized vision-language model designed for content safety assessment and dataset curation. Built on the LLaVA-OneVision architecture, this 0.5B parameter model represents an efficient implementation that maintains strong performance while reducing computational requirements. It features a 32K token context window and is specifically trained on the LlavaGuard-DS dataset.

Implementation Details

The model is implemented using the Transformers framework and operates with FP16 precision for efficient inference. It's designed to process both images and text inputs, providing detailed safety assessments across nine distinct policy categories.

Supports HF Transformers inference pipeline
Processes multi-modal inputs (images and text)
Implements a comprehensive safety taxonomy
Uses advanced generation parameters for optimal output

Core Capabilities

Content safety assessment across 9 policy categories
Multi-modal understanding and reasoning
Detailed rationale generation for safety decisions
Efficient processing with 894M parameters
Support for research and dataset curation tasks

Frequently Asked Questions

Q: What makes this model unique?

LlavaGuard stands out for its specialized focus on content safety assessment, combining vision-language capabilities with a comprehensive safety taxonomy. It's particularly notable for its efficient architecture while maintaining strong performance in safety evaluation tasks.

Q: What are the recommended use cases?

The model is primarily targeted toward researchers and is designed for: 1) Dataset curation and content safety assessment, 2) Research applications in content moderation, 3) Safety evaluation of visual content across multiple policy categories.