InternVL2-8B-MPO
Property | Value |
---|---|
Parameter Count | 8.08B |
Model Type | Multimodal LLM |
License | MIT |
Paper | arXiv:2411.10442 |
Tensor Type | BF16 |
What is InternVL2-8B-MPO?
InternVL2-8B-MPO is an advanced multimodal large language model that enhances the original InternVL2-8B through Mixed Preference Optimization (MPO). The model addresses the challenge of distribution shifts in multimodal reasoning by incorporating a novel preference optimization process.
Implementation Details
The model builds upon InternVL2-8B and introduces several key technical innovations: an automated preference data construction pipeline creating the MMPR dataset, and a Mixed Preference Optimization approach that significantly improves multimodal Chain-of-Thought (CoT) performance.
- Achieves 67.0% accuracy on MathVista, outperforming base model by 8.7 points
- Implements advanced visual-linguistic processing capabilities
- Supports multiple deployment options including 4-bit and 8-bit quantization
Core Capabilities
- Enhanced multimodal reasoning and Chain-of-Thought performance
- Reduced hallucination compared to base model
- Support for multi-image and video processing
- Multilingual capabilities
- Streaming output support
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its Mixed Preference Optimization approach, which significantly improves multimodal reasoning capabilities while maintaining efficient performance with just 8B parameters.
Q: What are the recommended use cases?
The model excels in multimodal reasoning tasks, image-text interactions, visual question answering, and complex visual analysis scenarios. It's particularly strong in tasks requiring detailed reasoning about visual content.