MiniCPM-Llama3-V-2_5

Maintained By
openbmb

MiniCPM-Llama3-V-2_5

PropertyValue
Parameter Count8.54B
Model TypeImage-Text-to-Text
ArchitectureSigLip-400M + Llama3-8B-Instruct
LicenseApache-2.0 (code), Custom for weights
Tensor TypeFP16

What is MiniCPM-Llama3-V-2_5?

MiniCPM-Llama3-V-2_5 is a groundbreaking multimodal language model that achieves GPT-4V level performance while being compact enough to run on mobile devices. Built on SigLip-400M and Llama3-8B-Instruct architectures, it represents a significant advancement in making powerful AI accessible on edge devices.

Implementation Details

The model implements advanced features through a combination of vision and language processing capabilities. It can process images up to 1.8 million pixels and supports real-time processing with optimized performance through various quantization techniques.

  • Achieves 65.1 average score on OpenCompass across 11 benchmarks
  • Supports 30+ languages including German, French, Spanish, Italian, Korean, and Japanese
  • Features 700+ score on OCRBench, surpassing many proprietary models
  • Implements RLAIF-V technology with only 10.3% hallucination rate

Core Capabilities

  • Advanced OCR capabilities with full-text extraction
  • Table-to-markdown conversion
  • Multi-language support and processing
  • Real-time video understanding
  • Efficient mobile deployment through NPU acceleration
  • Streaming output support

Frequently Asked Questions

Q: What makes this model unique?

The model combines GPT-4V level performance with mobile-first optimization, making it the first end-side MLLM to achieve such high performance while being deployable on phones and tablets.

Q: What are the recommended use cases?

The model excels in document processing, multilingual communication, visual understanding tasks, and mobile applications requiring real-time image and text processing.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.