SFR-Embedding-Mistral

Property	Value
Parameter Count	7.11B
License	CC-BY-NC-4.0
Base Model	Mistral-7B-v0.1
Type	Text Embeddings

What is SFR-Embedding-Mistral?

SFR-Embedding-Mistral is a state-of-the-art text embedding model developed by Salesforce Research, built on top of E5-mistral-7b-instruct and Mistral-7B-v0.1. The model is specifically designed for advanced text retrieval tasks and demonstrates exceptional performance across various MTEB benchmark evaluations.

Implementation Details

The model utilizes a 7.11B parameter architecture and implements sophisticated embedding techniques for text representation. It supports a maximum sequence length of 4096 tokens and provides normalized embeddings that can be used for similarity comparisons and retrieval tasks.

Built on Mistral-7B architecture with specialized training for embedding generation
Supports both Transformers and Sentence Transformers implementations
Optimized for retrieval and semantic similarity tasks
Implements last-token pooling for embedding extraction

Core Capabilities

High-performance text embeddings for information retrieval
Semantic similarity calculation
Document and query matching
Support for detailed task instructions in queries
Strong performance on MTEB benchmark tasks

Frequently Asked Questions

Q: What makes this model unique?

The model combines the powerful Mistral-7B architecture with specialized training for embedding generation, offering state-of-the-art performance on retrieval tasks while supporting instructed queries for better task specification.

Q: What are the recommended use cases?

The model excels in document retrieval, semantic search, text similarity analysis, and various classification tasks. It's particularly well-suited for applications requiring precise text representation and matching.