Ko-Llama3-Luxia-8B
Property | Value |
---|---|
Parameter Count | 8.17B |
Model Type | Language Model |
Architecture | Llama-3 |
License | Llama3 |
Training Precision | BF16 |
Context Length | 8K tokens |
What is Ko-Llama3-Luxia-8B?
Ko-Llama3-Luxia-8B is a specialized Korean language model developed by Saltlux AI Labs, based on Meta's Llama-3 architecture. This model represents a significant advancement in Korean language processing, featuring an expanded vocabulary with 17,536 additional Korean tokens and extensive training on over 100GB of carefully curated Korean text data.
Implementation Details
The model was trained using 8 NVIDIA H100 80GB GPUs, implementing Group Query Attention (GQA) and utilizing a learning rate of 1e-5 with a batch size of 128. The training data encompasses diverse domains including news, legal documents, patents, medical texts, historical content, and both formal and conversational Korean text.
- Extended vocabulary size of 145,792 tokens (original Llama-3: 128,256)
- Specialized Korean tokenization capabilities
- 8K token context window
- Trained with BF16 precision
Core Capabilities
- Enhanced Korean text generation and understanding
- Improved tokenization of Korean phrases and sentences
- Maintains English language capabilities while specializing in Korean
- Suitable for various natural language tasks in Korean
Frequently Asked Questions
Q: What makes this model unique?
The model's unique strength lies in its specialized Korean language capabilities, achieved through extensive Korean token additions and domain-specific training data, while maintaining the robust foundation of Llama-3's architecture.
Q: What are the recommended use cases?
The model is primarily designed for research purposes and can be freely utilized for various natural language generation tasks, particularly those involving Korean language processing and generation.