jina-embeddings-v2-base-zh

Maintained By
jinaai

Jina Embeddings V2 Base ZH

PropertyValue
Parameter Count161M
LicenseApache 2.0
PaperMulti-Task Contrastive Learning Paper
Max Sequence Length8192 tokens
LanguagesChinese, English

What is jina-embeddings-v2-base-zh?

Jina Embeddings V2 Base ZH is a state-of-the-art bilingual embedding model specifically designed for Chinese and English text processing. Built on a modified BERT architecture incorporating symmetric bidirectional ALiBi, it can handle extremely long sequences up to 8192 tokens while maintaining high performance in both monolingual and cross-lingual applications.

Implementation Details

The model utilizes a custom JinaBERT architecture that innovatively applies ALiBi (Attention with Linear Biases) to support extended sequence lengths. It's optimized for FP16 precision and requires mean pooling for optimal performance in generating sentence embeddings.

  • Advanced bilingual capabilities with unbiased processing of mixed Chinese-English input
  • Supports sequence lengths up to 8192 tokens
  • Implements symmetric bidirectional ALiBi for improved attention mechanisms
  • Optimized for both mono-lingual and cross-lingual retrieval tasks

Core Capabilities

  • High-performance text embedding generation for both Chinese and English content
  • Excellent performance in RAG (Retrieval-Augmented Generation) applications
  • Strong performance in sentence similarity tasks across languages
  • Efficient handling of long documents with 8192 token support

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its genuine bilingual capabilities, specifically optimized for Chinese-English applications, combined with support for extremely long sequences (8192 tokens) and state-of-the-art performance in RAG applications.

Q: What are the recommended use cases?

The model excels in document retrieval, semantic search, cross-lingual information retrieval, and RAG applications. It's particularly effective for applications requiring processing of long documents or mixed Chinese-English content.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.