AlphaMaze-v0.2-1.5B
Property | Value |
---|---|
Base Model | DeepSeek-R1-Distill-Qwen-1.5B |
Parameters | 1.5B |
Training Method | SFT + GRPO |
Paper | arXiv:2502.14669 |
Model Hub | Hugging Face |
What is AlphaMaze-v0.2-1.5B?
AlphaMaze-v0.2-1.5B is an innovative language model developed by Menlo Research, specifically designed to enhance visual reasoning capabilities in LLMs through maze-solving tasks. Unlike traditional approaches that rely on image generation, AlphaMaze focuses on pure text-based spatial reasoning, demonstrating that models can develop robust visual thinking abilities without generating actual images.
Implementation Details
The model implements a two-stage training approach: Supervised Fine-Tuning (SFT) using LLaMA-Factory, followed by Generalized Reward-based Policy Optimization (GRPO). Training was conducted on multiple datasets including Maze-Reasoning-v0.1 (420k samples) and Maze-Reasoning-GRPO-v0.1 (180k samples).
- Utilizes Flash Attention 2 for efficient processing
- Trained on 6xA6000 GPUs with near-zero loss achievement
- Implements specialized maze tokens for spatial representation
Core Capabilities
- Text-based maze navigation and solving
- Strategic path planning and dead-end detection
- Spatial relationship understanding from pure text descriptions
- Mental mapping of maze structures
Frequently Asked Questions
Q: What makes this model unique?
AlphaMaze-v0.2-1.5B stands out for its ability to develop visual reasoning capabilities purely through text, without requiring image generation. This approach demonstrates that language models can develop sophisticated spatial understanding through carefully designed text-based training.
Q: What are the recommended use cases?
The model is particularly suited for tasks requiring spatial reasoning, path planning, and navigation in text-based environments. It can be used for research in AI spatial cognition, educational applications in problem-solving, and as a benchmark for testing LLM visual reasoning capabilities.