bge-visualized

Maintained By
BAAI

BGE-Visualized

PropertyValue
AuthorBAAI
Dimensions768 (base) / 1024 (m3)
PaperVISTA Paper
Models Availablebge-visualized-base-en-v1.5, bge-visualized-m3

What is bge-visualized?

BGE-Visualized is an innovative universal multi-modal embedding model that extends the capabilities of the original BGE Text Embedding framework by incorporating image token embedding. This enhancement enables the model to process both text and image data, making it particularly effective for hybrid modal retrieval tasks.

Implementation Details

The model comes in two variants: a base English version with 768 dimensions and a multilingual version (M3) with 1024 dimensions. It's built upon the foundation of BGE's text embedding capabilities while incorporating image processing features from EVA-CLIP architecture.

  • Seamless integration of text and image embeddings
  • Zero-shot performance across multiple retrieval tasks
  • Supports multilingual processing in the M3 version
  • Maintains original BGE text embedding capabilities

Core Capabilities

  • Multi-Modal Knowledge Retrieval
  • Composed Image Retrieval
  • Knowledge Retrieval with Multi-Modal Queries
  • Multilingual processing support

Frequently Asked Questions

Q: What makes this model unique?

The model uniquely combines text and image processing capabilities while maintaining the strong text embedding performance of the original BGE model. It's specifically designed for hybrid modal retrieval tasks and offers zero-shot performance across multiple benchmarks.

Q: What are the recommended use cases?

The model excels in tasks like multi-modal knowledge retrieval, composed image retrieval, and handling multi-modal queries. However, while it can perform cross-modal retrieval (text to image), this isn't its primary intended use case.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.