jina-embeddings-v2-base-zh

jina-embeddings-v2-base-zh

jinaai

Bilingual Chinese-English embedding model with 161M parameters, 8192 sequence length support, based on BERT with ALiBi, optimized for high-performance text retrieval and RAG applications.

PropertyValue
Parameter Count161M
LicenseApache 2.0
PaperMulti-Task Contrastive Learning Paper
Max Sequence Length8192 tokens
LanguagesChinese, English

What is jina-embeddings-v2-base-zh?

Jina Embeddings V2 Base ZH is a state-of-the-art bilingual embedding model specifically designed for Chinese and English text processing. Built on a modified BERT architecture incorporating symmetric bidirectional ALiBi, it can handle extremely long sequences up to 8192 tokens while maintaining high performance in both monolingual and cross-lingual applications.

Implementation Details

The model utilizes a custom JinaBERT architecture that innovatively applies ALiBi (Attention with Linear Biases) to support extended sequence lengths. It's optimized for FP16 precision and requires mean pooling for optimal performance in generating sentence embeddings.

  • Advanced bilingual capabilities with unbiased processing of mixed Chinese-English input
  • Supports sequence lengths up to 8192 tokens
  • Implements symmetric bidirectional ALiBi for improved attention mechanisms
  • Optimized for both mono-lingual and cross-lingual retrieval tasks

Core Capabilities

  • High-performance text embedding generation for both Chinese and English content
  • Excellent performance in RAG (Retrieval-Augmented Generation) applications
  • Strong performance in sentence similarity tasks across languages
  • Efficient handling of long documents with 8192 token support

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its genuine bilingual capabilities, specifically optimized for Chinese-English applications, combined with support for extremely long sequences (8192 tokens) and state-of-the-art performance in RAG applications.

Q: What are the recommended use cases?

The model excels in document retrieval, semantic search, cross-lingual information retrieval, and RAG applications. It's particularly effective for applications requiring processing of long documents or mixed Chinese-English content.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026