bge-small-en

Maintained By
BAAI

BGE Small English Embedding Model

PropertyValue
Parameter Count33.4M
Model TypeText Embeddings
LicenseMIT
Primary PaperC-Pack: Packaged Resources To Advance General Chinese Embedding

What is bge-small-en?

BGE-small-en is a compact yet powerful embedding model developed by BAAI, designed for generating high-quality text embeddings for English language content. Despite its relatively small size of 33.4M parameters, it achieves impressive performance on the MTEB benchmark, making it an efficient choice for text similarity and retrieval tasks.

Implementation Details

The model uses transformer architecture and is optimized through contrastive learning. It supports a maximum sequence length of 512 tokens and includes special handling for retrieval tasks through query instructions.

  • Achieves 62.11 average score on MTEB benchmark
  • Optimized for both similarity matching and retrieval tasks
  • Supports efficient inference with FP16 precision
  • Integrated with popular frameworks like Sentence-Transformers and Langchain

Core Capabilities

  • Text Embedding Generation: Creates 384-dimensional dense vectors
  • Semantic Search: Excellent performance in retrieval tasks (51.82 on MTEB retrieval)
  • Classification Tasks: Strong performance (74.37 on classification benchmarks)
  • Cross-encoder Compatibility: Can be paired with BGE reranker for improved accuracy

Frequently Asked Questions

Q: What makes this model unique?

The model offers an excellent balance between model size and performance, making it particularly suitable for deployment in resource-constrained environments while maintaining strong embedding quality.

Q: What are the recommended use cases?

The model excels in semantic search, document retrieval, and text similarity tasks. It's particularly effective when used with the instruction prefix "Represent this sentence for searching relevant passages:" for retrieval tasks.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.