bce-reranker-base_v1

Property	Value
Developer	NetEase Youdao
License	Apache 2.0
Languages	English, Chinese, Japanese, Korean
Primary Use	Document Reranking for RAG

What is bce-reranker-base_v1?

bce-reranker-base_v1 is a specialized reranking model developed by NetEase Youdao as part of their BCEmbedding framework. It's designed specifically for improving retrieval-augmented generation (RAG) applications by providing high-quality document reranking capabilities across multiple languages. The model can handle documents in English, Chinese, Japanese, and Korean, making it particularly valuable for multilingual applications.

Implementation Details

The model implements a cross-encoder architecture optimized for reranking tasks. It can process text passages beyond the typical 512 token limit and provides meaningful similarity scores that can be used for filtering low-quality passages (recommended threshold: 0.35 or 0.4). In typical RAG pipelines, it's used to rerank the top 50-100 passages retrieved by an embedding model, ultimately selecting the top 5-10 passages for final use.

Multilingual support for English, Chinese, Japanese, and Korean
Optimized for RAG applications across various domains
Provides absolute similarity scores for quality filtering
Handles long text passages effectively

Core Capabilities

Cross-lingual reranking across four languages
Superior performance in multiple domain scenarios including education, law, finance, medical, and literature
Integrated support for major frameworks like LangChain and LlamaIndex
State-of-the-art performance in RAG evaluation metrics

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to handle four languages while providing meaningful similarity scores makes it stand out. It's specifically optimized for RAG applications and has shown superior performance compared to other reranking models in multilingual scenarios.

Q: What are the recommended use cases?

The model is best suited for RAG applications where precise document reranking is crucial. It's particularly effective when used in combination with an embedding model (like bce-embedding-base_v1) for initial retrieval, followed by reranking the top 50-100 passages to select the most relevant 5-10 documents.