SFR-Embedding-Code-2B_R

Maintained By
Salesforce

SFR-Embedding-Code-2B_R

PropertyValue
Model Size2B parameters
DeveloperSalesforce Research
PaperCodeXEmbed: A Generalist Embedding Model Family
LicenseResearch-only (Subject to Gemma Terms of Use)
Performance67.4% NDCG@10 on CoIR Benchmark

What is SFR-Embedding-Code-2B_R?

SFR-Embedding-Code-2B_R is a state-of-the-art embedding model designed for multilingual and multi-task code and text retrieval. Developed by Salesforce Research, it represents the largest model in the SFR-Embedding family, demonstrating superior performance compared to existing open-source code embedding models.

Implementation Details

The model can be easily implemented using either the Transformers or Sentence Transformers libraries. It supports a maximum sequence length of 32,768 tokens and requires specific instruction formatting for queries. The model architecture is based on Gemma and has been fine-tuned for code retrieval tasks.

  • Built on Gemma architecture with 2B parameters
  • Supports both code and text embedding generation
  • Implements instruction-based query formatting
  • Offers flexible integration through popular ML frameworks

Core Capabilities

  • Multilingual code retrieval and understanding
  • High-performance text-to-code matching
  • Superior embedding quality (67.4% NDCG@10)
  • Long context support up to 32K tokens
  • Flexible API support for both Transformers and Sentence Transformers

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its superior performance in code retrieval tasks, achieving the highest NDCG@10 score (67.4%) among comparable models on the CoIR benchmark. It's also notable for its large parameter count and versatility in handling both code and text embedding tasks.

Q: What are the recommended use cases?

The model is specifically designed for research purposes in code retrieval, documentation matching, and code-text alignment tasks. It's particularly useful for applications requiring high-quality code search, code documentation linking, and similar code detection.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.