cde-small-v2

cde-small-v2

jxm

State-of-the-art small embedding model (140M params) with 65.58 MTEB score, using innovative two-stage architecture for contextual document embedding

PropertyValue
Parameter Count140M (effective)
MTEB Score65.58
PaperContextual Document Embeddings
Authorjxm

What is cde-small-v2?

cde-small-v2 is a cutting-edge embedding model that introduces a novel two-stage architecture for generating context-aware document embeddings. As of January 2025, it ranks as the best small model (under 400M parameters) on the MTEB leaderboard for text embedding models.

Implementation Details

The model employs a unique two-stage architecture where the first stage gathers dataset information by embedding a corpus subset, while the second stage handles the actual embedding of queries and documents. This innovative approach allows for better context integration and improved embedding quality.

  • Uses ModernBERT as the base architecture
  • Implements residual connections between model stages
  • Features optimized pooling and position-embedding strategies
  • Trained on nomic-unsupervised dataset and fine-tuned on BGE dataset

Core Capabilities

  • High-quality document and query embeddings
  • Context-aware embedding generation
  • Efficient two-stage processing
  • Support for both Transformers and Sentence Transformers implementations

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its two-stage architecture that naturally integrates context tokens into the embedding process, allowing for more nuanced and context-aware embeddings while maintaining a relatively small parameter count.

Q: What are the recommended use cases?

The model is particularly well-suited for document retrieval tasks, semantic search applications, and any use case requiring high-quality text embeddings with context awareness. It performs especially well when corpus information is available ahead of time.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026