Mirage-in-the-Eyes
Property | Value |
---|---|
Author | RachelHGF |
Repository | Hugging Face |
Type | Adversarial Attack Framework |
What is Mirage-in-the-Eyes?
Mirage-in-the-Eyes is a sophisticated attack framework designed to expose and exploit vulnerabilities in Multi-modal Large Language Models (MLLMs). It specifically targets the hallucination problem by manipulating attention mechanisms, creating adversarial inputs that cause MLLMs to generate inaccurate content while maintaining high-quality responses. The framework has been successfully tested against 6 prominent MLLMs, including commercial APIs like GPT-4 and Gemini 1.5.
Implementation Details
The framework implements a novel approach to hallucination attacks through attention sink manipulation. It generates dynamic and transferable visual adversarial inputs without compromising response quality. The implementation is provided through a Python-based codebase that includes attack generation, response evaluation, and GPT-4 assisted assessment capabilities.
- Utilizes attention sink behaviors for targeted hallucination generation
- Implements dynamic adversarial input generation
- Provides comprehensive evaluation tools
- Compatible with multiple MLLM architectures
Core Capabilities
- Generate effective visual adversarial inputs
- Trigger controlled hallucinations in MLLMs
- Maintain high-quality model responses
- Transfer attacks across different MLLM architectures
- Evaluate and measure hallucination effectiveness
Frequently Asked Questions
Q: What makes this model unique?
Unlike previous adversarial methods that rely on fixed patterns, Mirage-in-the-Eyes generates dynamic and transferable visual adversarial inputs. It specifically targets the attention mechanisms of MLLMs, making it effective against even well-defended systems.
Q: What are the recommended use cases?
The framework is intended for research purposes only to identify and understand vulnerabilities in MLLMs. Access to the source code is restricted and provided only upon request to prevent potential misuse.