Snowflake Arctic Embed L v2.0

Property	Value
Total Parameters	568M
Non-embedding Parameters	303M
Embedding Dimensions	1024
License	Apache 2.0
Model URL	https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0

What is snowflake-arctic-embed-l-v2.0?

Snowflake's Arctic Embed L v2.0 represents a significant advancement in multilingual embedding models, designed specifically for enterprise-grade text retrieval applications. This model sets a new standard by delivering exceptional performance in both English and non-English text retrieval without compromising either.

Implementation Details

Built upon BAAI/bge-m3-retromae architecture, this model features 568M parameters with 303M non-embedding parameters. It supports a context window of up to 8192 tokens through RoPE (Rotary Position Embedding) and implements Matryoshka Representation Learning (MRL) for efficient embedding compression.

Achieves state-of-the-art performance on MTEB Retrieval, MIRACL, and CLEF benchmarks
Supports vector compression down to 128 bytes per document
Implements efficient inference with 1024-dimensional embeddings
Offers drop-in replacement capability for existing infrastructure

Core Capabilities

Superior multilingual performance with NDCG@10 scores of 55.6 on BEIR, 55.8 on MIRACL, and 54.3 on CLEF
Efficient compression with less than 3% quality degradation when reduced to 256 dimensions
Enterprise-ready with Apache 2.0 license for commercial applications
Seamless integration with popular frameworks like Sentence Transformers and Hugging Face

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to maintain high performance across multiple languages while supporting efficient compression and long context windows makes it stand out. It achieves this without the usual trade-offs between English and non-English performance.

Q: What are the recommended use cases?

The model is ideal for enterprise-scale multilingual search and retrieval applications, particularly where storage efficiency and query performance are crucial. It's especially suitable for applications requiring high-quality cross-lingual document retrieval and semantic search.