qihoo360_TinyR1-32B-Preview-v0.1-GGUF

Maintained By
bartowski

qihoo360 TinyR1-32B-Preview GGUF

PropertyValue
Original ModelTinyR1-32B-Preview
QuantizationGGUF format with imatrix options
Size Range9.96GB - 34.82GB
Authorbartowski

What is qihoo360_TinyR1-32B-Preview-v0.1-GGUF?

This is a comprehensive collection of GGUF quantized versions of the TinyR1-32B model, optimized for different hardware configurations and use cases. The collection features 24 different quantization variants, ranging from the high-quality Q8_0 (34.82GB) to the compact IQ2_XS (9.96GB).

Implementation Details

The model uses a specific prompt format: <|begin▁of▁sentence|>{system_prompt}<|User|>{prompt}<|Assistant|><|end▁of▁sentence|><|Assistant|>. The quantization was performed using llama.cpp release b4792 with imatrix options, incorporating special optimizations for embed and output weights in certain variants.

  • Special K-L variants use Q8_0 for embed and output weights
  • Online repacking support for ARM and AVX CPU inference
  • New IQ (Integer Quantization) variants offering better performance-to-size ratios

Core Capabilities

  • Multiple quantization options optimized for different hardware
  • Support for both CPU and GPU inference
  • Specialized variants for low-RAM environments
  • Enhanced performance through online weight repacking

Frequently Asked Questions

Q: What makes this model unique?

The model offers an unprecedented range of quantization options, allowing users to choose the perfect balance between model size, quality, and performance for their specific hardware setup. The implementation of both K-quants and I-quants provides flexibility for different inference backends.

Q: What are the recommended use cases?

For maximum quality, users should choose Q6_K_L or Q8_0 variants. For balanced performance, Q4_K_M is recommended as the default choice. For resource-constrained systems, the IQ2/IQ3 variants offer surprisingly usable performance at minimal size.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.