Granite Guardian 3.2 5B

Property	Value
Developer	IBM Research
Release Date	February 26, 2024
License	Apache 2.0
Paper	Granite Guardian Paper

What is granite-guardian-3.2-5b?

Granite Guardian 3.2 5B is a specialized AI safety model developed by IBM Research, designed to detect various risks in AI interactions. It's a streamlined version of the larger 8B parameter model, optimized through iterative pruning while maintaining high performance. The model serves as a guardian system for detecting harmful content, biases, jailbreaking attempts, and hallucinations in AI responses.

Implementation Details

The model employs a unique architecture that has been thinned down to 5B parameters (30% reduction from original) while maintaining competitive performance. It uses a yes/no scoring system for risk assessment and supports multiple risk detection scenarios through a flexible prompt template system.

Optimized for both prompt and response risk detection
Supports RAG (Retrieval-Augmented Generation) risk assessment
Specialized function calling hallucination detection
High performance on standard benchmarks (F1 score of 0.784 on aggregate harm detection)

Core Capabilities

Harm Detection: Identifies general harmful content, social bias, profanity, violence, and sexual content
Jailbreak Detection: Recognizes attempts to manipulate AI systems
RAG Assessment: Evaluates context relevance, groundedness, and answer relevance
Function Calling Validation: Detects syntactic and semantic hallucinations in API calls
Multi-turn Conversation Analysis: Evaluates harm engagement and evasiveness

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to detect a wide range of risks while maintaining high efficiency through parameter reduction makes it stand out. It achieves state-of-the-art performance on multiple benchmarks while requiring fewer computational resources than its predecessor.

Q: What are the recommended use cases?

The model is ideal for enterprise applications requiring risk assessment in AI interactions, including content moderation, RAG system validation, and function calling verification. It's particularly suited for scenarios requiring moderate cost and latency requirements like model risk assessment and monitoring.