Colossal-LLaMA-2-7b-base

Property	Value
License	LLaMA-2
Training Tokens	8.5B
Context Length	4096 tokens
Languages	Chinese, English
Paper	Link

What is Colossal-LLaMA-2-7b-base?

Colossal-LLaMA-2-7b-base is an innovative adaptation of LLaMA-2 that has been specifically enhanced to handle both Chinese and English language tasks. Developed by the Colossal-AI team, this model demonstrates that effective bilingual capabilities can be achieved through efficient training strategies, requiring only 15 hours of training on 64 A800 GPUs at a cost under $1,000.

Implementation Details

The model employs a unique multi-stage training approach, featuring an expanded vocabulary of 69,104 tokens (up from LLaMA-2's original 32,000) to better handle Chinese characters. The implementation uses a sophisticated bucket-based training strategy and includes specific optimizations for balancing dataset distributions.

Extended vocabulary for improved Chinese character handling
Three-stage training pipeline including knowledge injection and replay
Bucket-based training for balanced data distribution
4096-token context window

Core Capabilities

Strong performance on Chinese benchmarks (CEval: 50.20)
Maintained English language capabilities (MMLU: 53.06)
Efficient compression rate of 0.659 for text encoding
Balanced performance across multiple evaluation metrics including CMMLU and AGIEval

Frequently Asked Questions

Q: What makes this model unique?

This model achieves impressive bilingual capabilities with minimal additional training resources, demonstrating efficient knowledge transfer from LLaMA-2's foundation. The multi-stage training approach and expanded vocabulary make it particularly effective for Chinese language tasks while maintaining English proficiency.

Q: What are the recommended use cases?

The model is well-suited for bilingual applications requiring both Chinese and English language understanding, including text generation, comprehension tasks, and general language modeling. It's particularly effective for applications requiring balanced performance across both languages.