cde-small-v2

Property	Value
Parameter Count	140M (effective)
MTEB Score	65.58
Paper	Contextual Document Embeddings
Author	jxm

What is cde-small-v2?

cde-small-v2 is a cutting-edge embedding model that introduces a novel two-stage architecture for generating context-aware document embeddings. As of January 2025, it ranks as the best small model (under 400M parameters) on the MTEB leaderboard for text embedding models.

Implementation Details

The model employs a unique two-stage architecture where the first stage gathers dataset information by embedding a corpus subset, while the second stage handles the actual embedding of queries and documents. This innovative approach allows for better context integration and improved embedding quality.

Uses ModernBERT as the base architecture
Implements residual connections between model stages
Features optimized pooling and position-embedding strategies
Trained on nomic-unsupervised dataset and fine-tuned on BGE dataset

Core Capabilities

High-quality document and query embeddings
Context-aware embedding generation
Efficient two-stage processing
Support for both Transformers and Sentence Transformers implementations

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its two-stage architecture that naturally integrates context tokens into the embedding process, allowing for more nuanced and context-aware embeddings while maintaining a relatively small parameter count.

Q: What are the recommended use cases?

The model is particularly well-suited for document retrieval tasks, semantic search applications, and any use case requiring high-quality text embeddings with context awareness. It performs especially well when corpus information is available ahead of time.

cde-small-v2

cde-small-v2

What is cde-small-v2?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models