jina-embeddings-v2-base-de

Maintained By
jinaai

jina-embeddings-v2-base-de

PropertyValue
Parameter Count161M
LicenseApache 2.0
PaperMulti-Task Contrastive Learning Paper
Max Sequence Length8192 tokens

What is jina-embeddings-v2-base-de?

jina-embeddings-v2-base-de is a powerful bilingual text embedding model designed specifically for German and English language processing. Built on a modified BERT architecture (JinaBERT), it leverages symmetric bidirectional ALiBi to handle exceptionally long sequences up to 8192 tokens. The model excels in both monolingual and cross-lingual applications, particularly in scenarios involving mixed German-English content.

Implementation Details

The model utilizes mean pooling for optimal embedding generation and can be easily integrated using popular frameworks like transformers or sentence-transformers. It's been extensively evaluated on the MTEB benchmark, showing strong performance across various German and English tasks.

  • Architecture: Modified BERT with symmetric ALiBi
  • Parameter Count: 161 million
  • Maximum Sequence Length: 8192 tokens
  • Supported Languages: German and English

Core Capabilities

  • Bilingual text embedding generation
  • Long sequence processing (up to 8192 tokens)
  • High performance in cross-lingual applications
  • Efficient mean pooling implementation
  • State-of-the-art performance on MTEB benchmark

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to handle extremely long sequences (8192 tokens) and its specialized optimization for German-English bilingual content sets it apart. It uses a symmetric bidirectional ALiBi architecture, making it particularly effective for cross-lingual applications.

Q: What are the recommended use cases?

The model excels in various applications including cross-lingual information retrieval, semantic search, document similarity analysis, and RAG (Retrieval-Augmented Generation) systems. It's particularly effective when working with mixed German-English content or when requiring long sequence processing.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.