SFR-Embedding-2_R
Property | Value |
---|---|
Parameter Count | 7.11B |
License | CC-BY-NC-4.0 |
Tensor Type | BF16 |
Language | English |
What is SFR-Embedding-2_R?
SFR-Embedding-2_R is an advanced text embedding model developed by Salesforce Research, designed specifically for research applications. Building upon their previous SFR-Embedding work, this model represents a significant advancement in text embedding technology, utilizing a multi-stage training approach to achieve superior performance across various natural language processing tasks.
Implementation Details
The model implements a sophisticated architecture optimized for generating high-quality text embeddings. It supports a maximum sequence length of 4096 tokens and uses BF16 precision for efficient computation. The model can be easily integrated using either the Transformers library or Sentence Transformers framework.
- Instruction-based embedding generation with task-specific prompts
- Last-token pooling strategy for embedding extraction
- Normalized embeddings with cosine similarity scoring
- Support for both query and passage embedding generation
Core Capabilities
- Strong performance on MTEB benchmark tasks
- Excellent results in retrieval tasks (demonstrated by high MAP and MRR scores)
- Robust classification capabilities (90%+ accuracy on various tasks)
- Advanced semantic textual similarity assessment
- Effective clustering and pair classification
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its multi-stage training approach and ability to handle instruction-based embedding generation, making it particularly effective for research applications. Its large parameter count (7.11B) and sophisticated architecture enable superior performance across a wide range of NLP tasks.
Q: What are the recommended use cases?
The model excels in research applications including text retrieval, semantic similarity analysis, document classification, and clustering tasks. It's particularly well-suited for applications requiring high-quality text embeddings with instruction-based customization.