snowflake-arctic-embed-m-long

Maintained By
Snowflake

Snowflake Arctic-Embed-M-Long

PropertyValue
Parameters137M
Embedding Dimension768
Max Context Length8192 tokens (with RPE)
LicenseApache-2.0
PaperTechnical Report

What is snowflake-arctic-embed-m-long?

Snowflake-arctic-embed-m-long is a state-of-the-art text embedding model specifically designed for long-context retrieval tasks. Based on the nomic-ai/nomic-embed-text-v1-unsupervised architecture, it achieves an impressive MTEB Retrieval Score (NDCG@10) of 54.83, outperforming similar models in its class.

Implementation Details

The model leverages a multi-stage training pipeline, combining large-batch pretraining on 400M samples with specialized fine-tuning on 1M carefully curated triplets. It implements Rotary Position Embedding (RPE) to handle sequences up to 8192 tokens, making it particularly suitable for long document processing.

  • 768-dimensional embeddings for optimal representation
  • Support for both standard (2048 tokens) and extended (8192 tokens with RPE) contexts
  • Optimized for retrieval tasks with specialized query-document architecture

Core Capabilities

  • High-quality text embeddings for retrieval and similarity tasks
  • Extended context length support with RPE scaling
  • Efficient processing of both short and long documents
  • State-of-the-art performance in MTEB benchmarks

Frequently Asked Questions

Q: What makes this model unique?

The model combines extended context length support (up to 8192 tokens) with state-of-the-art retrieval performance, making it ideal for applications requiring both accuracy and long document processing.

Q: What are the recommended use cases?

The model excels in document retrieval, semantic search, and similarity matching tasks, particularly where long document context is important. It's especially suitable for enterprise applications requiring high-quality text embeddings.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.