CausalLM 14B

Property	Value
License	WTFPL
Languages	English, Chinese, Japanese
Framework	PyTorch
Training Data	1.3B tokens across 20 datasets

What is CausalLM 14B?

CausalLM 14B is a state-of-the-art language model that achieves remarkable performance across multiple benchmarks. Built on LLaMA 2's architecture, it demonstrates exceptional capabilities in both English and Chinese language tasks, while showing impressive cross-lingual transfer abilities.

Implementation Details

The model utilizes the standard LLaMA 2 architecture without additional RoPE scaling, trained on a carefully curated dataset of 1.3B tokens. It's fully compatible with various quantization methods, though the developers recommend using the base model when possible.

Achieves 67.36% average accuracy on MMLU
Scores 73.10% on CEval, outperforming GPT-4
Zero-shot accuracy of 70.13% on GSM8K
DPO version ranks #1 among ~13B models

Core Capabilities

Multi-lingual understanding and generation
Strong performance in STEM and humanities tasks
Efficient speculative sampling capabilities
Compatible with visual instruction fine-tuning

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its exceptional performance-to-size ratio, outperforming all models under 70B in most quantitative evaluations. It also offers remarkable cross-lingual capabilities despite focused training.

Q: What are the recommended use cases?

The model excels in general text generation, academic question-answering, and multilingual tasks. It's particularly suitable for applications requiring strong reasoning capabilities in both English and Chinese.

14B