CDE-Small-V1

Property	Value
Parameter Count	281M parameters
Model Type	Contextual Document Embeddings
Paper	ArXiv Paper
MTEB Score	65.00 (Best for models under 400M params)

What is cde-small-v1?

CDE-small-v1 is a groundbreaking text embedding model that introduces a novel two-stage approach to document embedding. It naturally integrates "context tokens" into the embedding process, achieving state-of-the-art performance on the MTEB leaderboard for models under 400M parameters.

Implementation Details

The model operates in two distinct stages: First, it gathers dataset information by embedding a subset of the corpus using a first-stage model. Second, it embeds queries and documents while conditioning on the corpus information from the first stage. This innovative approach allows the model to maintain context awareness while generating embeddings.

Two-stage architecture for context-aware embeddings
Compatible with both Transformers and Sentence-Transformers libraries
Supports task-specific prefixes for optimal performance
Requires exactly 512 context documents for optimal performance

Core Capabilities

State-of-the-art performance on MTEB benchmark
Efficient document and query embedding generation
Robust performance even without specific corpus information
Specialized handling of retrieval tasks through prefix prompting

Frequently Asked Questions

Q: What makes this model unique?

The model's two-stage approach and context-aware embedding generation set it apart, allowing it to achieve superior performance with a relatively small parameter count of 281M.

Q: What are the recommended use cases?

The model excels in document retrieval, semantic search, and text similarity tasks. It's particularly effective when you can provide corpus-specific context through the first-stage embedding process.

Q: How does it handle unknown corpora?

While the model performs best with corpus-specific context, it can still function effectively using provided random strings as context, with only a minor performance drop from 65.0 to 63.8 on MTEB.

cde-small-v1