gte-Qwen1.5-7B-instruct
Property | Value |
---|---|
Parameter Count | 7 Billion |
Embedding Dimension | 4096 |
Max Input Tokens | 32,000 |
MTEB Score | 67.34 |
C-MTEB Score | 69.52 |
Model Hub | Hugging Face |
What is gte-Qwen1.5-7B-instruct?
gte-Qwen1.5-7B-instruct is an advanced text embedding model developed by Alibaba-NLP, built upon the foundation of the Qwen1.5-7B language model. This cutting-edge embedding model represents a significant advancement in multilingual text representation, combining powerful language understanding capabilities with sophisticated embedding training techniques.
Implementation Details
The model employs a sophisticated architecture that incorporates bidirectional attention mechanisms and specialized instruction tuning on the query side. It generates 4096-dimensional embeddings and can process sequences up to 32,000 tokens in length, making it highly versatile for various applications.
- Built on Qwen1.5-7B base model architecture
- Implements bidirectional attention for enhanced context understanding
- Features query-side instruction tuning for improved efficiency
- Supports extensive multilingual capabilities
Core Capabilities
- State-of-the-art performance on MTEB (67.34) and C-MTEB (69.52) benchmarks
- Robust multilingual text embedding generation
- Extended context window of 32k tokens
- Efficient semantic similarity computation
- Advanced query-document matching
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its combination of large-scale language understanding capabilities inherited from Qwen1.5-7B and specialized embedding training techniques. It achieves superior performance on both English and Chinese benchmarks, making it particularly valuable for multilingual applications.
Q: What are the recommended use cases?
The model excels in various applications including semantic search, document retrieval, similarity matching, and cross-lingual information retrieval. Its large context window makes it particularly suitable for processing long documents and complex queries.