nomic-embed-text-v1.5

nomic-embed-text-v1.5

nomic-ai

A production-ready text embedding model with 137M parameters, featuring Matryoshka architecture for flexible dimensionality (64-768) and long context support up to 8192 tokens.

PropertyValue
Parameters137M
LicenseApache 2.0
Context Length8192 tokens
PaperNomic Embed: Training a Reproducible Long Context Text Embedder

What is nomic-embed-text-v1.5?

nomic-embed-text-v1.5 is an advanced text embedding model that implements Matryoshka Representation Learning, allowing flexible dimensionality reduction from 768 to as low as 64 dimensions with minimal performance impact. The model supports long-context processing up to 8192 tokens and is specifically designed for production deployment in search, clustering, and classification tasks.

Implementation Details

The model utilizes a multi-stage training pipeline, starting from a long-context BERT model and incorporating both unsupervised contrastive learning and supervised fine-tuning. It requires specific task instruction prefixes (search_document, search_query, clustering, classification) for optimal performance.

  • Supports dynamic dimensionality scaling (768, 512, 256, 128, 64)
  • Achieves 62.28 MTEB score at full dimensionality
  • Includes built-in sequence length scaling
  • Multimodal compatibility with nomic-embed-vision-v1

Core Capabilities

  • Document and query embedding for search applications
  • Semantic clustering and duplicate detection
  • Text classification
  • Cross-modal alignment with vision embeddings
  • Efficient resource usage through dimensional flexibility

Frequently Asked Questions

Q: What makes this model unique?

The model's Matryoshka architecture allows users to dynamically adjust embedding dimensions without retraining, making it highly flexible for different deployment scenarios while maintaining strong performance. Additionally, its long context support and task-specific instruction prefixes enhance its versatility.

Q: What are the recommended use cases?

The model excels in production deployments for RAG applications, semantic search, document clustering, and classification tasks. Its dimensional flexibility makes it particularly suitable for scenarios where resource constraints vary or need to be optimized.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026