SFR-Embedding-Mistral
Property | Value |
---|---|
Parameter Count | 7.11B |
License | CC-BY-NC-4.0 |
Base Model | Mistral-7B-v0.1 |
Type | Text Embeddings |
What is SFR-Embedding-Mistral?
SFR-Embedding-Mistral is a state-of-the-art text embedding model developed by Salesforce Research, built on top of E5-mistral-7b-instruct and Mistral-7B-v0.1. The model is specifically designed for advanced text retrieval tasks and demonstrates exceptional performance across various MTEB benchmark evaluations.
Implementation Details
The model utilizes a 7.11B parameter architecture and implements sophisticated embedding techniques for text representation. It supports a maximum sequence length of 4096 tokens and provides normalized embeddings that can be used for similarity comparisons and retrieval tasks.
- Built on Mistral-7B architecture with specialized training for embedding generation
- Supports both Transformers and Sentence Transformers implementations
- Optimized for retrieval and semantic similarity tasks
- Implements last-token pooling for embedding extraction
Core Capabilities
- High-performance text embeddings for information retrieval
- Semantic similarity calculation
- Document and query matching
- Support for detailed task instructions in queries
- Strong performance on MTEB benchmark tasks
Frequently Asked Questions
Q: What makes this model unique?
The model combines the powerful Mistral-7B architecture with specialized training for embedding generation, offering state-of-the-art performance on retrieval tasks while supporting instructed queries for better task specification.
Q: What are the recommended use cases?
The model excels in document retrieval, semantic search, text similarity analysis, and various classification tasks. It's particularly well-suited for applications requiring precise text representation and matching.