BGE Small English Embedding Model

Property	Value
Parameter Count	33.4M
Model Type	Text Embeddings
License	MIT
Primary Paper	C-Pack: Packaged Resources To Advance General Chinese Embedding

What is bge-small-en?

BGE-small-en is a compact yet powerful embedding model developed by BAAI, designed for generating high-quality text embeddings for English language content. Despite its relatively small size of 33.4M parameters, it achieves impressive performance on the MTEB benchmark, making it an efficient choice for text similarity and retrieval tasks.

Implementation Details

The model uses transformer architecture and is optimized through contrastive learning. It supports a maximum sequence length of 512 tokens and includes special handling for retrieval tasks through query instructions.

Achieves 62.11 average score on MTEB benchmark
Optimized for both similarity matching and retrieval tasks
Supports efficient inference with FP16 precision
Integrated with popular frameworks like Sentence-Transformers and Langchain

Core Capabilities

Text Embedding Generation: Creates 384-dimensional dense vectors
Semantic Search: Excellent performance in retrieval tasks (51.82 on MTEB retrieval)
Classification Tasks: Strong performance (74.37 on classification benchmarks)
Cross-encoder Compatibility: Can be paired with BGE reranker for improved accuracy

Frequently Asked Questions

Q: What makes this model unique?

The model offers an excellent balance between model size and performance, making it particularly suitable for deployment in resource-constrained environments while maintaining strong embedding quality.

Q: What are the recommended use cases?

The model excels in semantic search, document retrieval, and text similarity tasks. It's particularly effective when used with the instruction prefix "Represent this sentence for searching relevant passages:" for retrieval tasks.

bge-small-en