Mirage-in-the-Eyes

Maintained By
RachelHGF

Mirage-in-the-Eyes

PropertyValue
AuthorRachelHGF
RepositoryHugging Face
TypeAdversarial Attack Framework

What is Mirage-in-the-Eyes?

Mirage-in-the-Eyes is a sophisticated attack framework designed to expose and exploit vulnerabilities in Multi-modal Large Language Models (MLLMs). It specifically targets the hallucination problem by manipulating attention mechanisms, creating adversarial inputs that cause MLLMs to generate inaccurate content while maintaining high-quality responses. The framework has been successfully tested against 6 prominent MLLMs, including commercial APIs like GPT-4 and Gemini 1.5.

Implementation Details

The framework implements a novel approach to hallucination attacks through attention sink manipulation. It generates dynamic and transferable visual adversarial inputs without compromising response quality. The implementation is provided through a Python-based codebase that includes attack generation, response evaluation, and GPT-4 assisted assessment capabilities.

  • Utilizes attention sink behaviors for targeted hallucination generation
  • Implements dynamic adversarial input generation
  • Provides comprehensive evaluation tools
  • Compatible with multiple MLLM architectures

Core Capabilities

  • Generate effective visual adversarial inputs
  • Trigger controlled hallucinations in MLLMs
  • Maintain high-quality model responses
  • Transfer attacks across different MLLM architectures
  • Evaluate and measure hallucination effectiveness

Frequently Asked Questions

Q: What makes this model unique?

Unlike previous adversarial methods that rely on fixed patterns, Mirage-in-the-Eyes generates dynamic and transferable visual adversarial inputs. It specifically targets the attention mechanisms of MLLMs, making it effective against even well-defended systems.

Q: What are the recommended use cases?

The framework is intended for research purposes only to identify and understand vulnerabilities in MLLMs. Access to the source code is restricted and provided only upon request to prevent potential misuse.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.