AlphaMaze-v0.2-1.5B

Maintained By
homebrewltd

AlphaMaze-v0.2-1.5B

PropertyValue
Base ModelDeepSeek-R1-Distill-Qwen-1.5B
Parameters1.5B
Training MethodSFT + GRPO
PaperarXiv:2502.14669
Model HubHugging Face

What is AlphaMaze-v0.2-1.5B?

AlphaMaze-v0.2-1.5B is an innovative language model developed by Menlo Research, specifically designed to enhance visual reasoning capabilities in LLMs through maze-solving tasks. Unlike traditional approaches that rely on image generation, AlphaMaze focuses on pure text-based spatial reasoning, demonstrating that models can develop robust visual thinking abilities without generating actual images.

Implementation Details

The model implements a two-stage training approach: Supervised Fine-Tuning (SFT) using LLaMA-Factory, followed by Generalized Reward-based Policy Optimization (GRPO). Training was conducted on multiple datasets including Maze-Reasoning-v0.1 (420k samples) and Maze-Reasoning-GRPO-v0.1 (180k samples).

  • Utilizes Flash Attention 2 for efficient processing
  • Trained on 6xA6000 GPUs with near-zero loss achievement
  • Implements specialized maze tokens for spatial representation

Core Capabilities

  • Text-based maze navigation and solving
  • Strategic path planning and dead-end detection
  • Spatial relationship understanding from pure text descriptions
  • Mental mapping of maze structures

Frequently Asked Questions

Q: What makes this model unique?

AlphaMaze-v0.2-1.5B stands out for its ability to develop visual reasoning capabilities purely through text, without requiring image generation. This approach demonstrates that language models can develop sophisticated spatial understanding through carefully designed text-based training.

Q: What are the recommended use cases?

The model is particularly suited for tasks requiring spatial reasoning, path planning, and navigation in text-based environments. It can be used for research in AI spatial cognition, educational applications in problem-solving, and as a benchmark for testing LLM visual reasoning capabilities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.