MiniCPM-Llama3-V-2_5

MiniCPM-Llama3-V-2_5

openbmb

GPT-4V level multimodal LLM with 8.54B params, supporting 30+ languages. Excels in OCR, visual understanding, and efficient mobile deployment. State-of-the-art performance on multiple benchmarks.

PropertyValue
Parameter Count8.54B
Model TypeImage-Text-to-Text
ArchitectureSigLip-400M + Llama3-8B-Instruct
LicenseApache-2.0 (code), Custom for weights
Tensor TypeFP16

What is MiniCPM-Llama3-V-2_5?

MiniCPM-Llama3-V-2_5 is a groundbreaking multimodal language model that achieves GPT-4V level performance while being compact enough to run on mobile devices. Built on SigLip-400M and Llama3-8B-Instruct architectures, it represents a significant advancement in making powerful AI accessible on edge devices.

Implementation Details

The model implements advanced features through a combination of vision and language processing capabilities. It can process images up to 1.8 million pixels and supports real-time processing with optimized performance through various quantization techniques.

  • Achieves 65.1 average score on OpenCompass across 11 benchmarks
  • Supports 30+ languages including German, French, Spanish, Italian, Korean, and Japanese
  • Features 700+ score on OCRBench, surpassing many proprietary models
  • Implements RLAIF-V technology with only 10.3% hallucination rate

Core Capabilities

  • Advanced OCR capabilities with full-text extraction
  • Table-to-markdown conversion
  • Multi-language support and processing
  • Real-time video understanding
  • Efficient mobile deployment through NPU acceleration
  • Streaming output support

Frequently Asked Questions

Q: What makes this model unique?

The model combines GPT-4V level performance with mobile-first optimization, making it the first end-side MLLM to achieve such high performance while being deployable on phones and tablets.

Q: What are the recommended use cases?

The model excels in document processing, multilingual communication, visual understanding tasks, and mobile applications requiring real-time image and text processing.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026