nomic-embed-text-v1

Maintained By
nomic-ai

nomic-embed-text-v1

PropertyValue
Parameter Count137M
Context Length8192 tokens
LicenseApache 2.0
PaperarXiv:2402.01613

What is nomic-embed-text-v1?

nomic-embed-text-v1 is a state-of-the-art text embedding model that outperforms OpenAI's text-embedding-ada-002 and text-embedding-3-small on both short and long context tasks. With an impressive MTEB score of 62.39 and LoCo score of 85.53, it represents a significant advancement in open-source embedding technology.

Implementation Details

The model employs a multi-stage training pipeline, starting from a long-context BERT model. It uses unsupervised contrastive learning on diverse text pairs from sources like StackExchange and Quora, followed by fine-tuning on high-quality labeled datasets. The model requires specific task instruction prefixes for optimal performance.

  • Supports 8192 token context length with native scaling
  • Implements mean pooling for embedding generation
  • Available through multiple frameworks including Sentence Transformers and Transformers.js
  • Recently expanded to support multimodal capabilities through nomic-embed-vision-v1

Core Capabilities

  • Document embedding for RAG applications
  • Query embedding for search tasks
  • Text clustering and semantic duplicate detection
  • Classification task embeddings
  • Cross-modal alignment with vision embeddings

Frequently Asked Questions

Q: What makes this model unique?

The model combines open-source accessibility with state-of-the-art performance, supporting an extensive 8192 token context length while maintaining superior benchmark scores compared to proprietary alternatives.

Q: What are the recommended use cases?

The model excels in RAG applications, semantic search, document clustering, and classification tasks. It requires specific task prefixes (search_document, search_query, clustering, classification) for optimal performance in different scenarios.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.