BGE-VL-MLLM-S2

Maintained By
BAAI

BGE-VL-MLLM-S2

PropertyValue
DeveloperBAAI
LicenseMIT License
PaperMegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Release DateMarch 2024

What is BGE-VL-MLLM-S2?

BGE-VL-MLLM-S2 is an advanced multimodal retrieval model that builds upon its predecessor S1 with additional fine-tuning on the MMEB benchmark training set. It represents a significant advancement in universal multimodal retrieval, leveraging the innovative MegaPairs dataset containing over 26 million heterogeneous KNN triplets.

Implementation Details

The model is implemented using the Transformers library and can process both images and text inputs. It utilizes a sophisticated architecture that enables efficient multimodal embedding and retrieval capabilities.

  • Built on the MegaPairs dataset with additional MMEB fine-tuning
  • Supports both image-to-image and text-image-to-image retrieval
  • Implements normalized embeddings for accurate similarity scoring
  • Provides comprehensive API for both query and candidate processing

Core Capabilities

  • State-of-the-art performance in composed image retrieval
  • Enhanced multimodal embedding across diverse tasks
  • 8.1% improvement over previous SOTA on CIRCO benchmark
  • Superior performance on MMEB out-of-distribution tasks
  • Efficient processing of both images and text queries

Frequently Asked Questions

Q: What makes this model unique?

BGE-VL-MLLM-S2 stands out for its superior performance in multimodal retrieval tasks, achieving state-of-the-art results while maintaining efficient parameter usage. It's particularly notable for its additional fine-tuning on MMEB, which enhances its performance across a broader range of multimodal embedding tasks.

Q: What are the recommended use cases?

The model excels in composed image retrieval, multimodal search applications, and general-purpose image-text embedding tasks. It's particularly well-suited for applications requiring precise image retrieval based on both visual and textual queries.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.