qihoo360 TinyR1-32B-Preview GGUF

Property	Value
Original Model	TinyR1-32B-Preview
Quantization	GGUF format with imatrix options
Size Range	9.96GB - 34.82GB
Author	bartowski

What is qihoo360_TinyR1-32B-Preview-v0.1-GGUF?

This is a comprehensive collection of GGUF quantized versions of the TinyR1-32B model, optimized for different hardware configurations and use cases. The collection features 24 different quantization variants, ranging from the high-quality Q8_0 (34.82GB) to the compact IQ2_XS (9.96GB).

Implementation Details

The model uses a specific prompt format: <｜begin▁of▁sentence｜>{system_prompt}<｜User｜>{prompt}<｜Assistant｜><｜end▁of▁sentence｜><｜Assistant｜>. The quantization was performed using llama.cpp release b4792 with imatrix options, incorporating special optimizations for embed and output weights in certain variants.

Special K-L variants use Q8_0 for embed and output weights
Online repacking support for ARM and AVX CPU inference
New IQ (Integer Quantization) variants offering better performance-to-size ratios

Core Capabilities

Multiple quantization options optimized for different hardware
Support for both CPU and GPU inference
Specialized variants for low-RAM environments
Enhanced performance through online weight repacking

Frequently Asked Questions

Q: What makes this model unique?

The model offers an unprecedented range of quantization options, allowing users to choose the perfect balance between model size, quality, and performance for their specific hardware setup. The implementation of both K-quants and I-quants provides flexibility for different inference backends.

Q: What are the recommended use cases?

For maximum quality, users should choose Q6_K_L or Q8_0 variants. For balanced performance, Q4_K_M is recommended as the default choice. For resource-constrained systems, the IQ2/IQ3 variants offer surprisingly usable performance at minimal size.

qihoo360_TinyR1-32B-Preview-v0.1-GGUF