GLuCoSE-base-ja-v2

Maintained By
pkshatech

GLuCoSE-base-ja-v2

PropertyValue
Parameter Count133M
LicenseApache-2.0
Maximum Sequence Length512 tokens
Output Dimensions768
LanguageJapanese

What is GLuCoSE-base-ja-v2?

GLuCoSE-base-ja-v2 is a specialized Japanese text embedding model designed for high-performance retrieval tasks. Built upon the original GLuCoSE architecture, this model has been fine-tuned through an innovative multi-stage process involving distillation from larger models and contrastive learning. It achieves state-of-the-art performance among similar-sized models in various Japanese language tasks while maintaining efficiency.

Implementation Details

The model employs a sophisticated three-step training approach: ensemble distillation using teacher models like E5-mistral and gte-Qwen2, contrastive learning with multiple datasets, and search-specific optimization. It operates using cosine similarity for comparing embeddings and requires specific prefixes ("query:" or "passage:") for input processing.

  • Optimized for CPU inference with efficient processing
  • Achieves 85.5% Recall@5 on MIRACL benchmark
  • Supports both SentenceTransformers and Transformers implementations
  • Features 768-dimensional output embeddings

Core Capabilities

  • High-performance text retrieval and semantic search
  • Sentence similarity computation
  • Document embedding and comparison
  • Cross-lingual capability with Japanese focus

Frequently Asked Questions

Q: What makes this model unique?

GLuCoSE-base-ja-v2 stands out for its exceptional performance in Japanese language tasks while maintaining a relatively small parameter count (133M). It achieves competitive results against larger models like multilingual-e5-large (600M parameters) while being more efficient to deploy.

Q: What are the recommended use cases?

The model excels in Japanese text retrieval tasks, semantic search applications, and sentence similarity measurements. It's particularly well-suited for production environments where CPU inference is required, making it ideal for applications in search engines, recommendation systems, and document comparison tools.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.