BGE-Visualized

Property	Value
Author	BAAI
Dimensions	768 (base) / 1024 (m3)
Paper	VISTA Paper
Models Available	bge-visualized-base-en-v1.5, bge-visualized-m3

What is bge-visualized?

BGE-Visualized is an innovative universal multi-modal embedding model that extends the capabilities of the original BGE Text Embedding framework by incorporating image token embedding. This enhancement enables the model to process both text and image data, making it particularly effective for hybrid modal retrieval tasks.

Implementation Details

The model comes in two variants: a base English version with 768 dimensions and a multilingual version (M3) with 1024 dimensions. It's built upon the foundation of BGE's text embedding capabilities while incorporating image processing features from EVA-CLIP architecture.

Seamless integration of text and image embeddings
Zero-shot performance across multiple retrieval tasks
Supports multilingual processing in the M3 version
Maintains original BGE text embedding capabilities

Core Capabilities

Multi-Modal Knowledge Retrieval
Composed Image Retrieval
Knowledge Retrieval with Multi-Modal Queries
Multilingual processing support

Frequently Asked Questions

Q: What makes this model unique?

The model uniquely combines text and image processing capabilities while maintaining the strong text embedding performance of the original BGE model. It's specifically designed for hybrid modal retrieval tasks and offers zero-shot performance across multiple benchmarks.

Q: What are the recommended use cases?

The model excels in tasks like multi-modal knowledge retrieval, composed image retrieval, and handling multi-modal queries. However, while it can perform cross-modal retrieval (text to image), this isn't its primary intended use case.

bge-visualized