BGE-VL-large

Maintained By
BAAI

BGE-VL-large

PropertyValue
DeveloperBAAI
LicenseMIT
PaperMegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Release DateDecember 2023

What is BGE-VL-large?

BGE-VL-large is a state-of-the-art multimodal retrieval model that leverages the innovative MegaPairs data synthesis method. Built on a dataset of over 26 million triplets, it demonstrates exceptional performance in zero-shot composed image retrieval and multimodal embedding tasks. The model represents a significant advancement in universal multimodal retrieval technology.

Implementation Details

The model is implemented using the Transformers library and can be easily integrated into existing pipelines. It supports both image and text inputs, making it versatile for various multimodal tasks. The model architecture is optimized for efficient processing and accurate retrieval across different modalities.

  • Built on the MegaPairs dataset with 26M+ heterogeneous KNN triplets
  • Supports both standalone image encoding and combined image-text queries
  • Implements efficient encoding and retrieval mechanisms
  • Provides normalized embeddings for accurate similarity matching

Core Capabilities

  • State-of-the-art performance in zero-shot composed image retrieval
  • Excellence in multimodal embedding tasks
  • Robust generalization across different domains
  • Efficient scaling with increased training data
  • Superior performance compared to larger models with fewer parameters

Frequently Asked Questions

Q: What makes this model unique?

BGE-VL-large stands out for its ability to achieve state-of-the-art performance in composed image retrieval tasks while using significantly fewer parameters than competing models. It's built on the novel MegaPairs dataset, which enables robust generalization and efficient scaling.

Q: What are the recommended use cases?

The model excels in multimodal retrieval tasks, particularly in scenarios requiring composed image retrieval, cross-modal search, and universal multimodal embedding. It's especially suitable for applications needing precise image-text matching and retrieval capabilities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.