Kanana-nano-2.1b-instruct
Property | Value |
---|---|
Parameter Count | 2.1 Billion |
License | CC-BY-NC-4.0 |
Author | Kakao Corporation |
Model Type | Instruction-tuned Language Model |
Paper | arXiv:2502.18934 |
What is kanana-nano-2.1b-instruct?
Kanana-nano-2.1b-instruct is a compute-efficient bilingual language model developed by Kakao Corporation, specifically designed to excel in both Korean and English language tasks. As part of the larger Kanana model series, this 2.1B parameter model represents the compact version optimized for instruction-following tasks while maintaining strong performance particularly in Korean language capabilities.
Implementation Details
The model leverages several innovative techniques including high-quality data filtering, staged pre-training, depth up-scaling, and pruning and distillation to achieve its impressive performance despite its relatively small size. The model shows particularly strong results in Korean-language benchmarks, achieving 44.80 on KMMLU and 77.09 on HAERAE.
- Optimized for both Korean and English language processing
- Implements advanced training techniques for compute efficiency
- Supports instruction-following capabilities
- Trained without using any Kakao user data
Core Capabilities
- Bilingual understanding and generation
- Strong performance on instruction-following tasks (MT-Bench score: 6.400)
- Competitive performance in code-related tasks (HumanEval: 31.10)
- Mathematical reasoning capabilities (GSM8K: 46.32)
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its exceptional efficiency-to-performance ratio, particularly in Korean language tasks. Despite its compact 2.1B parameter size, it achieves competitive results against larger models, especially in Korean-language benchmarks.
Q: What are the recommended use cases?
The model is well-suited for bilingual applications requiring Korean and English language processing, instruction-following tasks, and general language understanding. It's particularly effective for scenarios where computational resources are limited but good performance is still required.