InternVL2-8B-MPO

Maintained By
OpenGVLab

InternVL2-8B-MPO

PropertyValue
Parameter Count8.08B
Model TypeMultimodal LLM
LicenseMIT
PaperarXiv:2411.10442
Tensor TypeBF16

What is InternVL2-8B-MPO?

InternVL2-8B-MPO is an advanced multimodal large language model that enhances the original InternVL2-8B through Mixed Preference Optimization (MPO). The model addresses the challenge of distribution shifts in multimodal reasoning by incorporating a novel preference optimization process.

Implementation Details

The model builds upon InternVL2-8B and introduces several key technical innovations: an automated preference data construction pipeline creating the MMPR dataset, and a Mixed Preference Optimization approach that significantly improves multimodal Chain-of-Thought (CoT) performance.

  • Achieves 67.0% accuracy on MathVista, outperforming base model by 8.7 points
  • Implements advanced visual-linguistic processing capabilities
  • Supports multiple deployment options including 4-bit and 8-bit quantization

Core Capabilities

  • Enhanced multimodal reasoning and Chain-of-Thought performance
  • Reduced hallucination compared to base model
  • Support for multi-image and video processing
  • Multilingual capabilities
  • Streaming output support

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its Mixed Preference Optimization approach, which significantly improves multimodal reasoning capabilities while maintaining efficient performance with just 8B parameters.

Q: What are the recommended use cases?

The model excels in multimodal reasoning tasks, image-text interactions, visual question answering, and complex visual analysis scenarios. It's particularly strong in tasks requiring detailed reasoning about visual content.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.