Colossal-LLaMA-2-7b-base

Colossal-LLaMA-2-7b-base

hpcai-tech

A bilingual LLaMA-2 variant trained on 8.5B tokens, optimized for Chinese/English tasks. Achieves strong performance on benchmarks with minimal training cost (~$1000).

PropertyValue
LicenseLLaMA-2
Training Tokens8.5B
Context Length4096 tokens
LanguagesChinese, English
PaperLink

What is Colossal-LLaMA-2-7b-base?

Colossal-LLaMA-2-7b-base is an innovative adaptation of LLaMA-2 that has been specifically enhanced to handle both Chinese and English language tasks. Developed by the Colossal-AI team, this model demonstrates that effective bilingual capabilities can be achieved through efficient training strategies, requiring only 15 hours of training on 64 A800 GPUs at a cost under $1,000.

Implementation Details

The model employs a unique multi-stage training approach, featuring an expanded vocabulary of 69,104 tokens (up from LLaMA-2's original 32,000) to better handle Chinese characters. The implementation uses a sophisticated bucket-based training strategy and includes specific optimizations for balancing dataset distributions.

  • Extended vocabulary for improved Chinese character handling
  • Three-stage training pipeline including knowledge injection and replay
  • Bucket-based training for balanced data distribution
  • 4096-token context window

Core Capabilities

  • Strong performance on Chinese benchmarks (CEval: 50.20)
  • Maintained English language capabilities (MMLU: 53.06)
  • Efficient compression rate of 0.659 for text encoding
  • Balanced performance across multiple evaluation metrics including CMMLU and AGIEval

Frequently Asked Questions

Q: What makes this model unique?

This model achieves impressive bilingual capabilities with minimal additional training resources, demonstrating efficient knowledge transfer from LLaMA-2's foundation. The multi-stage training approach and expanded vocabulary make it particularly effective for Chinese language tasks while maintaining English proficiency.

Q: What are the recommended use cases?

The model is well-suited for bilingual applications requiring both Chinese and English language understanding, including text generation, comprehension tasks, and general language modeling. It's particularly effective for applications requiring balanced performance across both languages.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026