OLMo-2-1124-13B-Instruct-GGUF

Property	Value
Parameter Count	13.7B
License	Apache 2.0
Format	GGUF
Language	English

What is OLMo-2-1124-13B-Instruct-GGUF?

OLMo-2-1124-13B-Instruct-GGUF is a comprehensive quantized version of the Allen AI's OLMo language model, specifically optimized for efficient deployment using the GGUF format. This model offers multiple quantization options ranging from full F16 precision (27.44GB) down to highly compressed IQ2_S (4.59GB) variants, making it adaptable to various hardware configurations and performance requirements.

Implementation Details

The model uses a specialized prompt format: "<|endoftext|><|system|>{system_prompt}<|user|>{prompt}<|assistant|>" and offers multiple quantization levels optimized using llama.cpp's imatrix option. Each variant is carefully balanced between model size and performance, with specific optimizations for different hardware architectures including ARM and AVX inference.

Multiple quantization options from Q8_0 to IQ2_S
Specialized versions for ARM processors with different optimization levels
Embed/output weights variants for enhanced performance

Core Capabilities

High-quality text generation with configurable precision levels
Optimized performance on various hardware configurations
Support for both CPU and GPU inference
Flexible deployment options based on available system resources

Frequently Asked Questions

Q: What makes this model unique?

The model offers an extensive range of quantization options with specific optimizations for different hardware architectures, making it highly versatile for various deployment scenarios. The implementation includes special considerations for embed/output weights and ARM-specific optimizations.

Q: What are the recommended use cases?

For users prioritizing quality, the Q6_K_L (11.51GB) variant is recommended. For balanced performance, Q4_K_M (8.35GB) is suggested as the default choice. For systems with limited resources, IQ3_XS (5.80GB) offers a good compromise between size and performance.