Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification

Back

Published

Jul 2, 2024

Updated

Oct 29, 2024

Pelican: Fighting AI Hallucinations in Vision-Language Models

Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification

Pritish Sahu|Karan Sikka|Ajay Divakaran

https://arxiv.org/abs/2407.02352v2

Summary

Imagine asking an AI assistant about a photo on your phone and getting a completely fabricated description. This frustrating phenomenon, called "hallucination," poses a significant challenge for advanced AI models that process both images and text (Vision-Language Models or VLMs). These powerful models are designed to understand and reason about visual scenes, generating captions, answering questions, and even following complex instructions. However, they sometimes generate nonsensical or factually incorrect outputs, limiting their real-world applications. Researchers at SRI International have introduced Pelican, a novel framework to combat these AI hallucinations. Pelican acts as a fact-checker for VLMs, scrutinizing their claims about images through a process called claim decomposition and verification. Think of it like breaking down a complex argument into smaller, verifiable statements. Pelican dissects a VLM's response into a chain of sub-claims, each represented by a question. For example, if a VLM claims, "A man wearing glasses is riding a motorcycle," Pelican might generate sub-claims like, "Is there a man?", "Is he wearing glasses?", "Is he riding something?", "Is it a motorcycle?". To answer these questions, Pelican ingeniously uses a 'Program of Thought' approach. This involves prompting another large language model (LLM) to create Python code that utilizes readily available visual tools like object detectors and Visual Question Answering (VQA) systems. The Python code effectively acts as a bridge between Pelican’s reasoning and the concrete visual evidence present in the image. Pelican's innovations include introducing intermediate variables for accurate object grounding, sharing calculations between sub-questions for efficiency and consistency, and reasoning in both natural language and computational methods. For instance, it might identify the "man" using a bounding box and then verify whether or not he is wearing “glasses” within that region. Tests using various benchmarks like MMHal-Bench, GAVIE, and MME showed Pelican significantly reduces hallucinations and improves accuracy across different VLMs. Notably, it achieved up to a 32% reduction in hallucination rates on MMHal-Bench and a substantial 27% drop compared to other hallucination mitigation approaches. Pelican's innovative approach marks a critical step towards creating more trustworthy and reliable VLMs. Its ability to systematically identify and correct hallucinations strengthens these models' potential for real-world applications, from assisting visually impaired users to automating complex analyses of images and videos. Although the research highlights some limitations like the brittleness of the tools, Pelican points towards a brighter future for multimodal AI models that can reliably see and understand the world around us.

🍰 Interesting in building your own agents?

PromptLayer provides the tools to manage and monitor prompts with your whole team. Get started for free.

Question & Answers

How does Pelican's Program of Thought approach work to verify AI claims about images?

Pelican's Program of Thought approach converts natural language claims into executable Python code for image verification. The process works by first breaking down a complex claim (e.g., 'A man wearing glasses is riding a motorcycle') into simpler sub-claims. Then, an LLM generates Python code that leverages visual tools like object detectors and VQA systems to verify each sub-claim. For example, it might first use an object detector to locate a person, create a bounding box around them, and then check within that region for the presence of glasses. This systematic approach allows for precise verification of complex visual claims while maintaining computational efficiency through shared calculations between related sub-questions.

What are the main benefits of AI hallucination detection in everyday applications?

AI hallucination detection helps ensure more reliable and trustworthy AI interactions in daily life. When AI systems can accurately verify their own responses, users receive more accurate information for tasks like photo organization, visual assistance for the visually impaired, or content moderation. For example, in social media, it can help prevent the spread of misleading AI-generated image descriptions. In healthcare, it can ensure more accurate analysis of medical images. This technology makes AI systems more dependable for critical decisions and reduces the risk of misinformation in various applications.

What is visual-language AI becoming important for businesses and consumers?

Visual-language AI is revolutionizing how we interact with digital content and automate tasks. For businesses, it enables automated product cataloging, visual quality control, and enhanced customer service through visual search and recognition. For consumers, it provides more intuitive ways to search for products, organize photos, and access information about their surroundings. The technology can help visually impaired users better understand their environment, assist in educational contexts through visual learning aids, and improve security systems through better object and activity recognition. Its growing accuracy and reliability make it increasingly valuable for both practical and innovative applications.

PromptLayer Features

Testing & Evaluation
Pelican's systematic verification approach aligns with PromptLayer's testing capabilities for evaluating model outputs against ground truth

Implementation Details

1. Create test suites with image-text pairs 2. Configure verification metrics 3. Set up automated testing pipelines 4. Track hallucination rates

Key Benefits

• Systematic evaluation of model accuracy • Automated regression testing • Performance tracking across model versions

Potential Improvements

• Integration with custom verification tools • Enhanced visualization of test results • Expanded metric collection capabilities

Business Value

Efficiency Gains

Reduces manual verification time by 70%

Cost Savings

Minimizes resources spent on error detection

Quality Improvement

Ensures consistent model output quality

Analytics
Workflow Management
Pelican's multi-step verification process maps to PromptLayer's workflow orchestration capabilities

Implementation Details

1. Define verification steps as modules 2. Create reusable templates 3. Configure step dependencies 4. Monitor workflow execution

Key Benefits

• Streamlined verification process • Reusable verification templates • Traceable execution history

Potential Improvements

• Enhanced error handling • Dynamic workflow adaptation • Better process visualization

Business Value

Efficiency Gains

Reduces workflow setup time by 50%

Cost Savings

Optimizes resource utilization through automation

Quality Improvement

Ensures consistent verification processes

Pelican: Fighting AI Hallucinations in Vision-Language Models

Summary

Question & Answers

PromptLayer Features

The first platform built for prompt engineering