BGE Small English Embedding Model
Property | Value |
---|---|
Parameter Count | 33.4M |
Model Type | Text Embeddings |
License | MIT |
Primary Paper | C-Pack: Packaged Resources To Advance General Chinese Embedding |
What is bge-small-en?
BGE-small-en is a compact yet powerful embedding model developed by BAAI, designed for generating high-quality text embeddings for English language content. Despite its relatively small size of 33.4M parameters, it achieves impressive performance on the MTEB benchmark, making it an efficient choice for text similarity and retrieval tasks.
Implementation Details
The model uses transformer architecture and is optimized through contrastive learning. It supports a maximum sequence length of 512 tokens and includes special handling for retrieval tasks through query instructions.
- Achieves 62.11 average score on MTEB benchmark
- Optimized for both similarity matching and retrieval tasks
- Supports efficient inference with FP16 precision
- Integrated with popular frameworks like Sentence-Transformers and Langchain
Core Capabilities
- Text Embedding Generation: Creates 384-dimensional dense vectors
- Semantic Search: Excellent performance in retrieval tasks (51.82 on MTEB retrieval)
- Classification Tasks: Strong performance (74.37 on classification benchmarks)
- Cross-encoder Compatibility: Can be paired with BGE reranker for improved accuracy
Frequently Asked Questions
Q: What makes this model unique?
The model offers an excellent balance between model size and performance, making it particularly suitable for deployment in resource-constrained environments while maintaining strong embedding quality.
Q: What are the recommended use cases?
The model excels in semantic search, document retrieval, and text similarity tasks. It's particularly effective when used with the instruction prefix "Represent this sentence for searching relevant passages:" for retrieval tasks.