imp-v1-3b

Maintained By
MILVLG

imp-v1-3b

PropertyValue
Parameter Count3.19B
Model TypeMultimodal Small Language Model
LicenseApache 2.0
PaperRead Paper
ArchitecturePhi-2 (2.7B) + SigLIP Visual Encoder (0.4B)

What is imp-v1-3b?

imp-v1-3b is a groundbreaking multimodal small language model that combines the power of Microsoft's Phi-2 language model with Google's SigLIP visual encoder. Developed by MILVLG at Hangzhou Dianzi University, this model achieves remarkable performance despite its compact size, matching or exceeding the capabilities of larger 7B parameter models.

Implementation Details

The model leverages a hybrid architecture that integrates a 2.7B parameter language model (Phi-2) with a 0.4B parameter visual encoder (SigLIP). Trained on the LLaVA-v1.5 dataset, it processes both text and images efficiently in FP16 precision.

  • Efficient architecture combining language and vision capabilities
  • Training based on LLaVA-v1.5 methodology
  • Optimized for deployment on mobile devices
  • Compatible with modern transformer-based frameworks

Core Capabilities

  • Achieves 81.42% accuracy on VQAv2 benchmark
  • Outperforms similar-sized models across 9 benchmarks
  • Excels in visual question answering tasks
  • Supports detailed image-text interactions
  • Optimized for resource-constrained environments

Frequently Asked Questions

Q: What makes this model unique?

imp-v1-3b stands out for achieving performance comparable to 7B parameter models while using only 3B parameters, making it ideal for mobile and resource-constrained applications. Its architecture efficiently combines vision and language capabilities in a compact form factor.

Q: What are the recommended use cases?

The model is particularly well-suited for visual question answering, image understanding tasks, and multimodal applications where resource efficiency is crucial. It's ideal for mobile devices and robots requiring strong visual-language understanding capabilities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.