OLMo-2-1124-13B-Instruct-GGUF

Maintained By
bartowski

OLMo-2-1124-13B-Instruct-GGUF

PropertyValue
Parameter Count13.7B
LicenseApache 2.0
FormatGGUF
LanguageEnglish

What is OLMo-2-1124-13B-Instruct-GGUF?

OLMo-2-1124-13B-Instruct-GGUF is a comprehensive quantized version of the Allen AI's OLMo language model, specifically optimized for efficient deployment using the GGUF format. This model offers multiple quantization options ranging from full F16 precision (27.44GB) down to highly compressed IQ2_S (4.59GB) variants, making it adaptable to various hardware configurations and performance requirements.

Implementation Details

The model uses a specialized prompt format: "<|endoftext|><|system|>{system_prompt}<|user|>{prompt}<|assistant|>" and offers multiple quantization levels optimized using llama.cpp's imatrix option. Each variant is carefully balanced between model size and performance, with specific optimizations for different hardware architectures including ARM and AVX inference.

  • Multiple quantization options from Q8_0 to IQ2_S
  • Specialized versions for ARM processors with different optimization levels
  • Embed/output weights variants for enhanced performance

Core Capabilities

  • High-quality text generation with configurable precision levels
  • Optimized performance on various hardware configurations
  • Support for both CPU and GPU inference
  • Flexible deployment options based on available system resources

Frequently Asked Questions

Q: What makes this model unique?

The model offers an extensive range of quantization options with specific optimizations for different hardware architectures, making it highly versatile for various deployment scenarios. The implementation includes special considerations for embed/output weights and ARM-specific optimizations.

Q: What are the recommended use cases?

For users prioritizing quality, the Q6_K_L (11.51GB) variant is recommended. For balanced performance, Q4_K_M (8.35GB) is suggested as the default choice. For systems with limited resources, IQ3_XS (5.80GB) offers a good compromise between size and performance.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.