OmniLMM-12B

Maintained By
openbmb

OmniLMM-12B

PropertyValue
Model Size11.6B parameters
LicenseApache-2.0 (code), Custom License for Parameters
Primary TaskVisual Question Answering
ArchitectureEVA02-5B + Zephyr-7B-β with Perceiver Resampler

What is OmniLMM-12B?

OmniLMM-12B is a state-of-the-art multimodal language model that combines visual and language processing capabilities. Built on the foundation of EVA02-5B and Zephyr-7B-β, it's designed to handle complex visual-language tasks with high accuracy and reliability. The model stands out for its RLHF alignment, making it particularly trustworthy in its responses compared to other LMMs.

Implementation Details

The model architecture integrates EVA02-5B and Zephyr-7B-β through a perceiver resampler layer, trained on multimodal data using a curriculum learning approach. It achieves impressive benchmarks, including leading performance on MME (1637 score), MMBench (71.6%), and Object HalBench (90.3%/95.5%).

  • Multimodal RLHF alignment for reduced hallucination
  • Real-time processing of video and speech streams
  • Curriculum-based training on diverse multimodal datasets

Core Capabilities

  • Strong performance on multiple multimodal benchmarks
  • Trustworthy behavior with minimal hallucination
  • Real-time multimodal interaction with camera and microphone inputs
  • Outperforms GPT-4V on Object HalBench
  • Rich multi-modal world knowledge understanding

Frequently Asked Questions

Q: What makes this model unique?

OmniLMM-12B is the first state-of-the-art open-source LMM aligned via multimodal RLHF for trustworthy behavior, significantly reducing hallucination issues common in other models. It achieves this while maintaining competitive performance across various benchmarks.

Q: What are the recommended use cases?

The model excels in visual question answering, real-time multimodal interaction, and tasks requiring high accuracy in image understanding. It's particularly suitable for applications where factual grounding and reduced hallucination are crucial.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.