granite-guardian-3.2-5b

Maintained By
ibm-granite

Granite Guardian 3.2 5B

PropertyValue
DeveloperIBM Research
Release DateFebruary 26, 2024
LicenseApache 2.0
PaperGranite Guardian Paper

What is granite-guardian-3.2-5b?

Granite Guardian 3.2 5B is a specialized AI safety model developed by IBM Research, designed to detect various risks in AI interactions. It's a streamlined version of the larger 8B parameter model, optimized through iterative pruning while maintaining high performance. The model serves as a guardian system for detecting harmful content, biases, jailbreaking attempts, and hallucinations in AI responses.

Implementation Details

The model employs a unique architecture that has been thinned down to 5B parameters (30% reduction from original) while maintaining competitive performance. It uses a yes/no scoring system for risk assessment and supports multiple risk detection scenarios through a flexible prompt template system.

  • Optimized for both prompt and response risk detection
  • Supports RAG (Retrieval-Augmented Generation) risk assessment
  • Specialized function calling hallucination detection
  • High performance on standard benchmarks (F1 score of 0.784 on aggregate harm detection)

Core Capabilities

  • Harm Detection: Identifies general harmful content, social bias, profanity, violence, and sexual content
  • Jailbreak Detection: Recognizes attempts to manipulate AI systems
  • RAG Assessment: Evaluates context relevance, groundedness, and answer relevance
  • Function Calling Validation: Detects syntactic and semantic hallucinations in API calls
  • Multi-turn Conversation Analysis: Evaluates harm engagement and evasiveness

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to detect a wide range of risks while maintaining high efficiency through parameter reduction makes it stand out. It achieves state-of-the-art performance on multiple benchmarks while requiring fewer computational resources than its predecessor.

Q: What are the recommended use cases?

The model is ideal for enterprise applications requiring risk assessment in AI interactions, including content moderation, RAG system validation, and function calling verification. It's particularly suited for scenarios requiring moderate cost and latency requirements like model risk assessment and monitoring.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.