InternVL2-8B-MPO

Maintained By
OpenGVLab

InternVL2-8B-MPO

PropertyValue
Parameter Count8.08B
LicenseMIT
PaperarXiv:2411.10442
Tensor TypeBF16

What is InternVL2-8B-MPO?

InternVL2-8B-MPO is an advanced multimodal large language model that represents a significant breakthrough in visual-linguistic reasoning capabilities. Built upon the InternVL2-8B base model, it incorporates a novel Mixed Preference Optimization (MPO) process to enhance its multimodal reasoning abilities and reduce hallucinations.

Implementation Details

The model employs a sophisticated architecture optimized through preference learning, utilizing the MMPR dataset for training. It supports various deployment options including 16-bit (bf16/fp16) precision and 4-bit/8-bit quantization for efficient inference.

  • Advanced multimodal reasoning capabilities through MPO training
  • Support for multiple image inputs and batch processing
  • Streaming output capability for real-time generation
  • Flexible deployment options with varying precision levels

Core Capabilities

  • Strong performance on MathVista (67.0% accuracy)
  • Enhanced Chain-of-Thought reasoning
  • Multilingual support
  • Multi-image and video processing
  • Real-time conversation abilities

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its Mixed Preference Optimization approach, which significantly improves its reasoning capabilities, particularly in multimodal tasks. It achieves performance comparable to models 10 times its size on certain benchmarks.

Q: What are the recommended use cases?

The model excels in multimodal reasoning tasks, visual-linguistic conversations, mathematical problem-solving, and general image understanding. It's particularly suited for applications requiring strong reasoning capabilities with visual inputs.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.