CausalLM 7B

Property	Value
Model Size	7 Billion parameters
Architecture	LLaMA 2-compatible
Training Data	1.3B tokens synthetic data
Model URL	https://huggingface.co/CausalLM/7B

What is CausalLM 7B?

CausalLM 7B is a state-of-the-art language model that combines the architecture of LLaMA 2 with sophisticated training on carefully curated synthetic data. It's a distilled version of the 14B model, specifically optimized for speculative sampling and achieving remarkable performance across various benchmarks.

Implementation Details

The model leverages Qwen and LLaMA 2 weights for initialization and maintains complete compatibility with the LLaMA 2 architecture. It uses the original Multiple Head Attention (MHA) calculation method and standard Rotary Positional Encoding (RoPE) without additional scaling.

Trained on 1.3B tokens of synthetic data
100% synthetic data training approach
Compatible with GGUF, GPTQ, and AWQ quantization
Supports ChatML prompt format

Core Capabilities

MMLU Average Accuracy: 63.82% (outperforming models up to 33B)
CEval Average Accuracy: 70.27% (best among 7B models)
GSM8K Zero-shot Accuracy: 59.21%
MT-Bench Score (DPO-α): 7.038125

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its exceptional performance despite its relatively small size, achieving better results than many larger models through sophisticated synthetic data training and distillation techniques.

Q: What are the recommended use cases?

The model is well-suited for general language tasks, mathematical reasoning, and can be adapted for multimodal capabilities through its LLaVA1.5 prompt format compatibility. However, users should implement their own safety filters as the model was trained on unfiltered internet data.

7B