InternVL2-8B-MPO
Property | Value |
---|---|
Parameter Count | 8.08B |
License | MIT |
Paper | arXiv:2411.10442 |
Tensor Type | BF16 |
What is InternVL2-8B-MPO?
InternVL2-8B-MPO is an advanced multimodal large language model that represents a significant breakthrough in visual-linguistic reasoning capabilities. Built upon the InternVL2-8B base model, it incorporates a novel Mixed Preference Optimization (MPO) process to enhance its multimodal reasoning abilities and reduce hallucinations.
Implementation Details
The model employs a sophisticated architecture optimized through preference learning, utilizing the MMPR dataset for training. It supports various deployment options including 16-bit (bf16/fp16) precision and 4-bit/8-bit quantization for efficient inference.
- Advanced multimodal reasoning capabilities through MPO training
- Support for multiple image inputs and batch processing
- Streaming output capability for real-time generation
- Flexible deployment options with varying precision levels
Core Capabilities
- Strong performance on MathVista (67.0% accuracy)
- Enhanced Chain-of-Thought reasoning
- Multilingual support
- Multi-image and video processing
- Real-time conversation abilities
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its Mixed Preference Optimization approach, which significantly improves its reasoning capabilities, particularly in multimodal tasks. It achieves performance comparable to models 10 times its size on certain benchmarks.
Q: What are the recommended use cases?
The model excels in multimodal reasoning tasks, visual-linguistic conversations, mathematical problem-solving, and general image understanding. It's particularly suited for applications requiring strong reasoning capabilities with visual inputs.