Colossal-LLaMA-2-7b-base

Maintained By
hpcai-tech

Colossal-LLaMA-2-7b-base

PropertyValue
LicenseLLaMA-2
Training Tokens8.5B
Context Length4096 tokens
LanguagesChinese, English
PaperLink

What is Colossal-LLaMA-2-7b-base?

Colossal-LLaMA-2-7b-base is an innovative adaptation of LLaMA-2 that has been specifically enhanced to handle both Chinese and English language tasks. Developed by the Colossal-AI team, this model demonstrates that effective bilingual capabilities can be achieved through efficient training strategies, requiring only 15 hours of training on 64 A800 GPUs at a cost under $1,000.

Implementation Details

The model employs a unique multi-stage training approach, featuring an expanded vocabulary of 69,104 tokens (up from LLaMA-2's original 32,000) to better handle Chinese characters. The implementation uses a sophisticated bucket-based training strategy and includes specific optimizations for balancing dataset distributions.

  • Extended vocabulary for improved Chinese character handling
  • Three-stage training pipeline including knowledge injection and replay
  • Bucket-based training for balanced data distribution
  • 4096-token context window

Core Capabilities

  • Strong performance on Chinese benchmarks (CEval: 50.20)
  • Maintained English language capabilities (MMLU: 53.06)
  • Efficient compression rate of 0.659 for text encoding
  • Balanced performance across multiple evaluation metrics including CMMLU and AGIEval

Frequently Asked Questions

Q: What makes this model unique?

This model achieves impressive bilingual capabilities with minimal additional training resources, demonstrating efficient knowledge transfer from LLaMA-2's foundation. The multi-stage training approach and expanded vocabulary make it particularly effective for Chinese language tasks while maintaining English proficiency.

Q: What are the recommended use cases?

The model is well-suited for bilingual applications requiring both Chinese and English language understanding, including text generation, comprehension tasks, and general language modeling. It's particularly effective for applications requiring balanced performance across both languages.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.