Sky-T1-32B-Preview

Maintained By
NovaSky-AI

Sky-T1-32B-Preview

PropertyValue
Parameter Count32 Billion
Base ModelQwen2.5-32B-Instruct
Training Data17K verified responses
Model URLHuggingFace
DeveloperNovaSky Team, UC Berkeley

What is Sky-T1-32B-Preview?

Sky-T1-32B-Preview is a sophisticated reasoning model developed by the NovaSky Team at UC Berkeley's Sky Computing Lab. Built upon Qwen2.5-32B-Instruct, this model has been fine-tuned with 17,000 verified correct responses, focusing particularly on coding and mathematical reasoning capabilities. The model demonstrates performance comparable to o1-preview across various benchmarks.

Implementation Details

The model was trained using Llama-Factory with DeepSpeed Zero-3 Offload on 8 H100 GPUs, completing training in 19 hours. The training process utilized a batch size of 96 and incorporated supervised fine-tuning techniques.

  • Achieves 82.4% on Math500 benchmark
  • Scores 43.3% on AIME2024
  • Demonstrates strong performance in LiveCodeBench with 86.3% on Easy, 56.8% on Medium, and 17.9% on Hard tasks
  • Shows robust scientific reasoning with 56.8% accuracy on GPQA-Diamond

Core Capabilities

  • Advanced mathematical problem-solving
  • Strong coding abilities across different difficulty levels
  • Scientific reasoning and analysis
  • Competitive performance with leading models in the field

Frequently Asked Questions

Q: What makes this model unique?

Sky-T1-32B-Preview stands out for its focused training on verified correct responses, particularly in mathematics and coding. It achieves performance comparable to o1-preview while being fully open-source and trained with a relatively modest budget of $450.

Q: What are the recommended use cases?

The model is particularly well-suited for mathematical problem-solving, coding tasks, and scientific reasoning applications. It performs especially well on complex mathematical challenges and coding problems across various difficulty levels.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.