bge-large-en

Maintained By
BAAI

BGE Large English Embedding Model

PropertyValue
Parameter Count335M
LicenseMIT
PaperC-Pack: Packaged Resources To Advance General Chinese Embedding
FrameworkPyTorch with Transformers

What is bge-large-en?

BGE-large-en is a state-of-the-art text embedding model developed by BAAI that maps text to dense vector representations. It achieves top performance on the MTEB benchmark, making it particularly effective for semantic search, similarity comparison, and retrieval tasks.

Implementation Details

The model utilizes a transformer-based architecture with 335M parameters and generates 1024-dimensional embeddings. It's trained using contrastive learning on large-scale paired data and supports sequence lengths up to 512 tokens.

  • Optimized for both retrieval and semantic similarity tasks
  • Supports efficient batched processing with FP16 computation
  • Provides specialized query instruction handling for improved retrieval performance

Core Capabilities

  • High-performance text embedding generation for retrieval tasks
  • Excellent performance on classification, clustering, and reranking
  • Strong cross-lingual understanding capabilities
  • Efficient integration with popular frameworks like Sentence-Transformers and Langchain

Frequently Asked Questions

Q: What makes this model unique?

The model achieves state-of-the-art performance on the MTEB benchmark and provides specialized query instruction handling for improved retrieval performance. It's particularly notable for its balance of efficiency and accuracy.

Q: What are the recommended use cases?

The model excels in semantic search, document retrieval, similarity comparison, and text classification tasks. It's particularly well-suited for production environments requiring high-quality text embeddings.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.