DCLM-7B
Property | Value |
---|---|
Parameter Count | 6.89B |
Training Tokens | 2.5T |
Context Length | 2048 |
License | Apple ASCL |
Paper | DataComp-LM Paper |
What is DCLM-7B?
DCLM-7B is a state-of-the-art language model developed by the DataComp for Language Models team, representing a significant advancement in open-source AI. Built on a decoder-only Transformer architecture, it combines carefully curated training data including DCLM-BASELINE, StarCoder, and ProofPile2 datasets, totaling 4.1T tokens.
Implementation Details
The model features 32 layers with 4096 hidden dimensions and 32 attention heads. It was trained using AdamW optimizer with a peak learning rate of 2e-3 and weight decay of 0.05. Training was conducted on H100 GPUs with a batch size of 2048 sequences.
- Architecture: Decoder-only Transformer
- Framework: PyTorch with OpenLM
- Sequence Length: 2048 tokens
- Training Infrastructure: H100 GPUs
Core Capabilities
- Strong performance on MMLU (63.7% few-shot)
- Excellent results on reasoning tasks (HellaSwag: 80.4%)
- High accuracy in common sense tasks (COPA: 85%, SIQA: 82.9%)
- Robust performance on reading comprehension and QA
Frequently Asked Questions
Q: What makes this model unique?
DCLM-7B stands out for its open-source nature combined with state-of-the-art performance, achieving results competitive with closed-source models. It demonstrates particularly strong performance in reasoning and QA tasks while maintaining full transparency in its training data and methodology.
Q: What are the recommended use cases?
The model excels in tasks requiring reasoning, comprehension, and general knowledge application. It's particularly well-suited for academic research, text analysis, and general language understanding tasks. However, users should note it hasn't undergone specific safety fine-tuning.