Mirage-in-the-Eyes

Property	Value
Author	RachelHGF
Repository	Hugging Face
Type	Adversarial Attack Framework

What is Mirage-in-the-Eyes?

Mirage-in-the-Eyes is a sophisticated attack framework designed to expose and exploit vulnerabilities in Multi-modal Large Language Models (MLLMs). It specifically targets the hallucination problem by manipulating attention mechanisms, creating adversarial inputs that cause MLLMs to generate inaccurate content while maintaining high-quality responses. The framework has been successfully tested against 6 prominent MLLMs, including commercial APIs like GPT-4 and Gemini 1.5.

Implementation Details

The framework implements a novel approach to hallucination attacks through attention sink manipulation. It generates dynamic and transferable visual adversarial inputs without compromising response quality. The implementation is provided through a Python-based codebase that includes attack generation, response evaluation, and GPT-4 assisted assessment capabilities.

Utilizes attention sink behaviors for targeted hallucination generation
Implements dynamic adversarial input generation
Provides comprehensive evaluation tools
Compatible with multiple MLLM architectures

Core Capabilities

Generate effective visual adversarial inputs
Trigger controlled hallucinations in MLLMs
Maintain high-quality model responses
Transfer attacks across different MLLM architectures
Evaluate and measure hallucination effectiveness

Frequently Asked Questions

Q: What makes this model unique?

Unlike previous adversarial methods that rely on fixed patterns, Mirage-in-the-Eyes generates dynamic and transferable visual adversarial inputs. It specifically targets the attention mechanisms of MLLMs, making it effective against even well-defended systems.

Q: What are the recommended use cases?

The framework is intended for research purposes only to identify and understand vulnerabilities in MLLMs. Access to the source code is restricted and provided only upon request to prevent potential misuse.