LlavaGuard-v1.2-0.5B-OV-hf
Property | Value |
---|---|
Parameter Count | 894M |
Model Type | Vision-Language Model |
Tensor Type | FP16 |
Paper | LlavaGuard-Arxiv |
Base Model | llava-onevision-qwen2-0.5b-ov |
What is LlavaGuard-v1.2-0.5B-OV-hf?
LlavaGuard is a specialized vision-language model designed for content safety assessment and dataset curation. Built on the LLaVA-OneVision architecture, this 0.5B parameter model represents an efficient implementation that maintains strong performance while reducing computational requirements. It features a 32K token context window and is specifically trained on the LlavaGuard-DS dataset.
Implementation Details
The model is implemented using the Transformers framework and operates with FP16 precision for efficient inference. It's designed to process both images and text inputs, providing detailed safety assessments across nine distinct policy categories.
- Supports HF Transformers inference pipeline
- Processes multi-modal inputs (images and text)
- Implements a comprehensive safety taxonomy
- Uses advanced generation parameters for optimal output
Core Capabilities
- Content safety assessment across 9 policy categories
- Multi-modal understanding and reasoning
- Detailed rationale generation for safety decisions
- Efficient processing with 894M parameters
- Support for research and dataset curation tasks
Frequently Asked Questions
Q: What makes this model unique?
LlavaGuard stands out for its specialized focus on content safety assessment, combining vision-language capabilities with a comprehensive safety taxonomy. It's particularly notable for its efficient architecture while maintaining strong performance in safety evaluation tasks.
Q: What are the recommended use cases?
The model is primarily targeted toward researchers and is designed for: 1) Dataset curation and content safety assessment, 2) Research applications in content moderation, 3) Safety evaluation of visual content across multiple policy categories.