KURE-v1
Property | Value |
---|---|
Developer | NLP&AI Lab |
Base Model | BAAI/bge-m3 |
License | MIT |
Embedding Dimension | 1024 |
Sequence Length | 8192 |
What is KURE-v1?
KURE-v1 (Korea University Retrieval Embedding) is a specialized embedding model designed for Korean text retrieval. Fine-tuned from BAAI/bge-m3 using CachedGISTEmbedLoss, it represents a significant advancement in Korean language processing, particularly excelling in retrieval tasks compared to other multilingual models.
Implementation Details
The model was trained on 2 million Korean query-document pairs with 5 hard negatives per example. The training procedure utilized a batch size of 4096, learning rate of 2e-05, and ran for one epoch using the CachedGISTEmbedLoss from sentence-transformers.
- Supports both Korean and English text processing
- Achieves state-of-the-art performance across 8 different benchmark datasets
- Implements efficient embedding generation with 1024-dimensional vectors
Core Capabilities
- Top-1 retrieval performance with 0.52640 recall and 0.60551 precision
- Exceptional performance in various domains including finance, healthcare, legal, and commerce
- Robust handling of long documents with 8192 token context window
- Supports diverse retrieval tasks from Wikipedia-based queries to domain-specific applications
Frequently Asked Questions
Q: What makes this model unique?
KURE-v1 stands out for its superior performance in Korean text retrieval, consistently outperforming other multilingual models across multiple benchmarks. It's specifically optimized for Korean language understanding while maintaining English language capabilities.
Q: What are the recommended use cases?
The model excels in document retrieval tasks across various domains including finance, healthcare, legal, and public sector applications. It's particularly effective for multi-hop question answering, long document retrieval, and domain-specific information extraction.