Mistral-Small-24B-Instruct-2501-reasoning
Property | Value |
---|---|
Parameter Count | 24 Billion |
Model Type | Instruction-tuned Language Model |
License | Apache 2.0 |
Base Model | mistralai/Mistral-Small-24B-Instruct-2501 |
Training Infrastructure | 4×8 H100 GPUs |
What is Mistral-Small-24B-Instruct-2501-reasoning?
This is a specialized version of the Mistral-Small-24B model, fine-tuned specifically for enhanced mathematical reasoning capabilities. Developed by Yenting Lin and funded by Ubitus, the model has been optimized using datasets including OpenR1-Math-220k and s1K-1.1, achieving impressive performance metrics across various mathematical benchmarks.
Implementation Details
The model leverages advanced training techniques including gradient checkpointing, flash attention, and specialized plugins like LigerRoPE and LigerRMS normalization. Training was conducted using a cosine learning rate scheduler with AdamW optimizer, implementing BF16 precision and a sequence length of 32768 tokens.
- Utilizes sample packing for efficient training
- Implements gradient accumulation steps of 4
- Features a warmup ratio of 0.1
- Employs DeepSpeed ZeRO-3 optimization
Core Capabilities
- 95.0% accuracy on MATH-500 benchmark
- 53.33% success rate on AIME 2025
- 66.67% accuracy on AIME 2024
- 62.02% performance on GPQA Diamond
- Significantly outperforms the base Mistral-24B-Instruct model on mathematical tasks
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its specialized mathematical reasoning capabilities, achieving competitive results against larger models like DeepSeek-R1 while maintaining a relatively compact 24B parameter size. It represents a significant improvement over the base Mistral model in mathematical problem-solving tasks.
Q: What are the recommended use cases?
This model is particularly well-suited for complex mathematical problem-solving, including competition-level mathematics (as evidenced by its AIME performance), advanced mathematical reasoning tasks, and general mathematical computation where precise logical thinking is required.