Granite Guardian 3.2 5B
Property | Value |
---|---|
Developer | IBM Research |
Release Date | February 26, 2024 |
License | Apache 2.0 |
Paper | Granite Guardian Paper |
What is granite-guardian-3.2-5b?
Granite Guardian 3.2 5B is a specialized AI safety model developed by IBM Research, designed to detect various risks in AI interactions. It's a streamlined version of the larger 8B parameter model, optimized through iterative pruning while maintaining high performance. The model serves as a guardian system for detecting harmful content, biases, jailbreaking attempts, and hallucinations in AI responses.
Implementation Details
The model employs a unique architecture that has been thinned down to 5B parameters (30% reduction from original) while maintaining competitive performance. It uses a yes/no scoring system for risk assessment and supports multiple risk detection scenarios through a flexible prompt template system.
- Optimized for both prompt and response risk detection
- Supports RAG (Retrieval-Augmented Generation) risk assessment
- Specialized function calling hallucination detection
- High performance on standard benchmarks (F1 score of 0.784 on aggregate harm detection)
Core Capabilities
- Harm Detection: Identifies general harmful content, social bias, profanity, violence, and sexual content
- Jailbreak Detection: Recognizes attempts to manipulate AI systems
- RAG Assessment: Evaluates context relevance, groundedness, and answer relevance
- Function Calling Validation: Detects syntactic and semantic hallucinations in API calls
- Multi-turn Conversation Analysis: Evaluates harm engagement and evasiveness
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to detect a wide range of risks while maintaining high efficiency through parameter reduction makes it stand out. It achieves state-of-the-art performance on multiple benchmarks while requiring fewer computational resources than its predecessor.
Q: What are the recommended use cases?
The model is ideal for enterprise applications requiring risk assessment in AI interactions, including content moderation, RAG system validation, and function calling verification. It's particularly suited for scenarios requiring moderate cost and latency requirements like model risk assessment and monitoring.