Solar Pro Preview Instruct

Property	Value
Parameter Count	22.1B
License	MIT
Tensor Type	BF16
Context Length	4K tokens
Language	English

What is solar-pro-preview-instruct?

Solar Pro Preview is an advanced large language model that represents a significant breakthrough in efficient AI deployment. Developed by Upstage, this 22B parameter model is specifically designed to run on a single GPU with 80GB VRAM, while delivering performance that rivals models three times its size. It's built using an enhanced depth up-scaling method, transforming a Phi-3-medium model from 14B to 22B parameters.

Implementation Details

The model utilizes the ChatML template for optimal performance in conversational and instruction-following tasks. It's implemented using the Transformers library and requires specific dependencies including torch 2.3.1 and flash_attn 2.5.8. The model particularly excels in benchmark performance, scoring 79.14 on MMLU and 84.37 on IFEval.

Enhanced depth up-scaling architecture
Optimized for single GPU deployment
Carefully curated training strategy
Supports instruction-tuned tasks

Core Capabilities

Superior instruction-following abilities (84.37 on IFEval)
Strong performance on mathematical reasoning (89.69 on GSM8K)
Excellent knowledge assessment scores (79.14 on MMLU)
Efficient resource utilization on single GPU
Comprehensive benchmark performance across multiple domains

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines efficiency with high performance, delivering capabilities comparable to 70B parameter models while requiring only a single GPU. Its enhanced depth up-scaling method and carefully curated training strategy enable superior performance in instruction-following and knowledge-based tasks.

Q: What are the recommended use cases?

The model is particularly well-suited for conversational AI applications, instruction-following tasks, and scenarios requiring strong reasoning capabilities. It's ideal for deployments where computational resources are limited but high performance is required.