CausalLM 7B
Property | Value |
---|---|
Model Size | 7 Billion parameters |
Architecture | LLaMA 2-compatible |
Training Data | 1.3B tokens synthetic data |
Model URL | https://huggingface.co/CausalLM/7B |
What is CausalLM 7B?
CausalLM 7B is a state-of-the-art language model that combines the architecture of LLaMA 2 with sophisticated training on carefully curated synthetic data. It's a distilled version of the 14B model, specifically optimized for speculative sampling and achieving remarkable performance across various benchmarks.
Implementation Details
The model leverages Qwen and LLaMA 2 weights for initialization and maintains complete compatibility with the LLaMA 2 architecture. It uses the original Multiple Head Attention (MHA) calculation method and standard Rotary Positional Encoding (RoPE) without additional scaling.
- Trained on 1.3B tokens of synthetic data
- 100% synthetic data training approach
- Compatible with GGUF, GPTQ, and AWQ quantization
- Supports ChatML prompt format
Core Capabilities
- MMLU Average Accuracy: 63.82% (outperforming models up to 33B)
- CEval Average Accuracy: 70.27% (best among 7B models)
- GSM8K Zero-shot Accuracy: 59.21%
- MT-Bench Score (DPO-α): 7.038125
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its exceptional performance despite its relatively small size, achieving better results than many larger models through sophisticated synthetic data training and distillation techniques.
Q: What are the recommended use cases?
The model is well-suited for general language tasks, mathematical reasoning, and can be adapted for multimodal capabilities through its LLaVA1.5 prompt format compatibility. However, users should implement their own safety filters as the model was trained on unfiltered internet data.