7B

Maintained By
CausalLM

CausalLM 7B

PropertyValue
Model Size7 Billion parameters
ArchitectureLLaMA 2-compatible
Training Data1.3B tokens synthetic data
Model URLhttps://huggingface.co/CausalLM/7B

What is CausalLM 7B?

CausalLM 7B is a state-of-the-art language model that combines the architecture of LLaMA 2 with sophisticated training on carefully curated synthetic data. It's a distilled version of the 14B model, specifically optimized for speculative sampling and achieving remarkable performance across various benchmarks.

Implementation Details

The model leverages Qwen and LLaMA 2 weights for initialization and maintains complete compatibility with the LLaMA 2 architecture. It uses the original Multiple Head Attention (MHA) calculation method and standard Rotary Positional Encoding (RoPE) without additional scaling.

  • Trained on 1.3B tokens of synthetic data
  • 100% synthetic data training approach
  • Compatible with GGUF, GPTQ, and AWQ quantization
  • Supports ChatML prompt format

Core Capabilities

  • MMLU Average Accuracy: 63.82% (outperforming models up to 33B)
  • CEval Average Accuracy: 70.27% (best among 7B models)
  • GSM8K Zero-shot Accuracy: 59.21%
  • MT-Bench Score (DPO-α): 7.038125

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its exceptional performance despite its relatively small size, achieving better results than many larger models through sophisticated synthetic data training and distillation techniques.

Q: What are the recommended use cases?

The model is well-suited for general language tasks, mathematical reasoning, and can be adapted for multimodal capabilities through its LLaVA1.5 prompt format compatibility. However, users should implement their own safety filters as the model was trained on unfiltered internet data.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.