llemma_7b
Property | Value |
---|---|
License | Llama2 |
Paper | arXiv:2310.10631 |
Training Data | Proof-Pile-2, Open-Web-Math |
Base Model | Code Llama 7B |
What is llemma_7b?
Llemma 7B is a specialized language model designed specifically for mathematical reasoning and computation. Developed by EleutherAI, it's built on Code Llama 7B architecture and trained on the Proof-Pile-2 dataset for 200B tokens, making it particularly adept at mathematical problems and formal proofs.
Implementation Details
The model represents a significant advancement in mathematical language models, leveraging the powerful Code Llama architecture while specializing in mathematical applications. It's trained using PyTorch and optimized for text-generation-inference.
- Built on Code Llama 7B base architecture
- Trained on specialized mathematical datasets
- Supports both English language and mathematical notation
- Optimized for chain-of-thought reasoning
Core Capabilities
- Outperforms similarly-sized models on GSM8k (36.4% accuracy)
- Strong performance on MMLU-STEM (37.7% accuracy)
- Exceptional results on SAT math problems (53.1% accuracy)
- Capable of majority voting enhancement (up to 54.0% on GSM8k)
- Specialized in computational mathematics and formal theorem proving
Frequently Asked Questions
Q: What makes this model unique?
Llemma 7B stands out for its specialized focus on mathematical reasoning, significantly outperforming other models of similar size in mathematical tasks. It's particularly notable for achieving better results than both Llama-2 and Code Llama on mathematical benchmarks.
Q: What are the recommended use cases?
The model is ideal for mathematical problem-solving, chain-of-thought reasoning, computational mathematics, and formal theorem proving. It's particularly well-suited for educational applications, research in mathematical reasoning, and automated mathematical proof assistance.