BGE Large English Embedding Model
Property | Value |
---|---|
Parameter Count | 335M |
License | MIT |
Paper | C-Pack: Packaged Resources To Advance General Chinese Embedding |
Framework | PyTorch with Transformers |
What is bge-large-en?
BGE-large-en is a state-of-the-art text embedding model developed by BAAI that maps text to dense vector representations. It achieves top performance on the MTEB benchmark, making it particularly effective for semantic search, similarity comparison, and retrieval tasks.
Implementation Details
The model utilizes a transformer-based architecture with 335M parameters and generates 1024-dimensional embeddings. It's trained using contrastive learning on large-scale paired data and supports sequence lengths up to 512 tokens.
- Optimized for both retrieval and semantic similarity tasks
- Supports efficient batched processing with FP16 computation
- Provides specialized query instruction handling for improved retrieval performance
Core Capabilities
- High-performance text embedding generation for retrieval tasks
- Excellent performance on classification, clustering, and reranking
- Strong cross-lingual understanding capabilities
- Efficient integration with popular frameworks like Sentence-Transformers and Langchain
Frequently Asked Questions
Q: What makes this model unique?
The model achieves state-of-the-art performance on the MTEB benchmark and provides specialized query instruction handling for improved retrieval performance. It's particularly notable for its balance of efficiency and accuracy.
Q: What are the recommended use cases?
The model excels in semantic search, document retrieval, similarity comparison, and text classification tasks. It's particularly well-suited for production environments requiring high-quality text embeddings.