ZR1-1.5B
Property | Value |
---|---|
Parameter Count | 1.5B |
Model Type | Reasoning and Coding Model |
Author | Zyphra |
Hugging Face | Zyphra/ZR1-1.5B |
What is ZR1-1.5B?
ZR1-1.5B is a specialized AI model that focuses on reasoning tasks, particularly in mathematics and coding. Despite its relatively small size of 1.5B parameters, it demonstrates remarkable performance, outperforming Llama-3.1-70B-Instruct on hard coding tasks and achieving a 37.91% pass@1 accuracy on GPQA-Diamond. The model represents a significant improvement over its base R1-Distill-1.5B model, showing over 50% better performance.
Implementation Details
The model was trained using the PRIME (Process Reinforcement through IMplicit rEwards) algorithm, leveraging a dataset of approximately 400k math and 25k code samples. Training was conducted on an 8xH100 node setup, utilizing progressive context lengthening from 8k to 24k tokens.
- Trained on verified coding and mathematics problems using reinforcement learning
- Employs PRIME + RLOO with token-level granularity
- Uses dynamic batch sizing with accuracy filtering
- Implements iterative context lengthening for improved efficiency
Core Capabilities
- Achieves 74% accuracy on AMPS Hard coding tasks
- Shows strong performance across various math benchmarks including AIME, AMC, and Olympiad
- Demonstrates 40% accuracy on Leetcode problems
- Maintains high performance with both sampling and greedy decoding
- Supports context lengths up to 32,768 tokens
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to achieve performance comparable to much larger models while maintaining a compact 1.5B parameter size makes it unique. It demonstrates that careful training and architecture design can compensate for model size in specialized tasks.
Q: What are the recommended use cases?
ZR1-1.5B is particularly well-suited for mathematical reasoning, coding problems, and general problem-solving tasks. It excels in scenarios requiring step-by-step reasoning and can handle both short and long-form responses with high accuracy.