BGE-Visualized
Property | Value |
---|---|
Author | BAAI |
Dimensions | 768 (base) / 1024 (m3) |
Paper | VISTA Paper |
Models Available | bge-visualized-base-en-v1.5, bge-visualized-m3 |
What is bge-visualized?
BGE-Visualized is an innovative universal multi-modal embedding model that extends the capabilities of the original BGE Text Embedding framework by incorporating image token embedding. This enhancement enables the model to process both text and image data, making it particularly effective for hybrid modal retrieval tasks.
Implementation Details
The model comes in two variants: a base English version with 768 dimensions and a multilingual version (M3) with 1024 dimensions. It's built upon the foundation of BGE's text embedding capabilities while incorporating image processing features from EVA-CLIP architecture.
- Seamless integration of text and image embeddings
- Zero-shot performance across multiple retrieval tasks
- Supports multilingual processing in the M3 version
- Maintains original BGE text embedding capabilities
Core Capabilities
- Multi-Modal Knowledge Retrieval
- Composed Image Retrieval
- Knowledge Retrieval with Multi-Modal Queries
- Multilingual processing support
Frequently Asked Questions
Q: What makes this model unique?
The model uniquely combines text and image processing capabilities while maintaining the strong text embedding performance of the original BGE model. It's specifically designed for hybrid modal retrieval tasks and offers zero-shot performance across multiple benchmarks.
Q: What are the recommended use cases?
The model excels in tasks like multi-modal knowledge retrieval, composed image retrieval, and handling multi-modal queries. However, while it can perform cross-modal retrieval (text to image), this isn't its primary intended use case.