CausalLM 14B
Property | Value |
---|---|
License | WTFPL |
Languages | English, Chinese, Japanese |
Framework | PyTorch |
Training Data | 1.3B tokens across 20 datasets |
What is CausalLM 14B?
CausalLM 14B is a state-of-the-art language model that achieves remarkable performance across multiple benchmarks. Built on LLaMA 2's architecture, it demonstrates exceptional capabilities in both English and Chinese language tasks, while showing impressive cross-lingual transfer abilities.
Implementation Details
The model utilizes the standard LLaMA 2 architecture without additional RoPE scaling, trained on a carefully curated dataset of 1.3B tokens. It's fully compatible with various quantization methods, though the developers recommend using the base model when possible.
- Achieves 67.36% average accuracy on MMLU
- Scores 73.10% on CEval, outperforming GPT-4
- Zero-shot accuracy of 70.13% on GSM8K
- DPO version ranks #1 among ~13B models
Core Capabilities
- Multi-lingual understanding and generation
- Strong performance in STEM and humanities tasks
- Efficient speculative sampling capabilities
- Compatible with visual instruction fine-tuning
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its exceptional performance-to-size ratio, outperforming all models under 70B in most quantitative evaluations. It also offers remarkable cross-lingual capabilities despite focused training.
Q: What are the recommended use cases?
The model excels in general text generation, academic question-answering, and multilingual tasks. It's particularly suitable for applications requiring strong reasoning capabilities in both English and Chinese.