DCLM-7B

Maintained By
apple

DCLM-7B

PropertyValue
Parameter Count6.89B
Training Tokens2.5T
Context Length2048
LicenseApple ASCL
PaperDataComp-LM Paper

What is DCLM-7B?

DCLM-7B is a state-of-the-art language model developed by the DataComp for Language Models team, representing a significant advancement in open-source AI. Built on a decoder-only Transformer architecture, it combines carefully curated training data including DCLM-BASELINE, StarCoder, and ProofPile2 datasets, totaling 4.1T tokens.

Implementation Details

The model features 32 layers with 4096 hidden dimensions and 32 attention heads. It was trained using AdamW optimizer with a peak learning rate of 2e-3 and weight decay of 0.05. Training was conducted on H100 GPUs with a batch size of 2048 sequences.

  • Architecture: Decoder-only Transformer
  • Framework: PyTorch with OpenLM
  • Sequence Length: 2048 tokens
  • Training Infrastructure: H100 GPUs

Core Capabilities

  • Strong performance on MMLU (63.7% few-shot)
  • Excellent results on reasoning tasks (HellaSwag: 80.4%)
  • High accuracy in common sense tasks (COPA: 85%, SIQA: 82.9%)
  • Robust performance on reading comprehension and QA

Frequently Asked Questions

Q: What makes this model unique?

DCLM-7B stands out for its open-source nature combined with state-of-the-art performance, achieving results competitive with closed-source models. It demonstrates particularly strong performance in reasoning and QA tasks while maintaining full transparency in its training data and methodology.

Q: What are the recommended use cases?

The model excels in tasks requiring reasoning, comprehension, and general knowledge application. It's particularly well-suited for academic research, text analysis, and general language understanding tasks. However, users should note it hasn't undergone specific safety fine-tuning.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.