Colossal-LLaMA-2-7b-base
Property | Value |
---|---|
License | LLaMA-2 |
Training Tokens | 8.5B |
Context Length | 4096 tokens |
Languages | Chinese, English |
Paper | Link |
What is Colossal-LLaMA-2-7b-base?
Colossal-LLaMA-2-7b-base is an innovative adaptation of LLaMA-2 that has been specifically enhanced to handle both Chinese and English language tasks. Developed by the Colossal-AI team, this model demonstrates that effective bilingual capabilities can be achieved through efficient training strategies, requiring only 15 hours of training on 64 A800 GPUs at a cost under $1,000.
Implementation Details
The model employs a unique multi-stage training approach, featuring an expanded vocabulary of 69,104 tokens (up from LLaMA-2's original 32,000) to better handle Chinese characters. The implementation uses a sophisticated bucket-based training strategy and includes specific optimizations for balancing dataset distributions.
- Extended vocabulary for improved Chinese character handling
- Three-stage training pipeline including knowledge injection and replay
- Bucket-based training for balanced data distribution
- 4096-token context window
Core Capabilities
- Strong performance on Chinese benchmarks (CEval: 50.20)
- Maintained English language capabilities (MMLU: 53.06)
- Efficient compression rate of 0.659 for text encoding
- Balanced performance across multiple evaluation metrics including CMMLU and AGIEval
Frequently Asked Questions
Q: What makes this model unique?
This model achieves impressive bilingual capabilities with minimal additional training resources, demonstrating efficient knowledge transfer from LLaMA-2's foundation. The multi-stage training approach and expanded vocabulary make it particularly effective for Chinese language tasks while maintaining English proficiency.
Q: What are the recommended use cases?
The model is well-suited for bilingual applications requiring both Chinese and English language understanding, including text generation, comprehension tasks, and general language modeling. It's particularly effective for applications requiring balanced performance across both languages.