RekaAI_reka-flash-3-GGUF

Maintained By
bartowski

RekaAI_reka-flash-3-GGUF

PropertyValue
Authorbartowski
Original ModelRekaAI/reka-flash-3
Quantization Frameworkllama.cpp (b4867)
Size Range7.39GB - 41.82GB

What is RekaAI_reka-flash-3-GGUF?

RekaAI_reka-flash-3-GGUF is a comprehensive collection of quantized versions of the reka-flash-3 model, optimized for different hardware configurations and memory constraints. The collection includes 27 different quantization variants, ranging from full BF16 precision to highly compressed IQ2 formats.

Implementation Details

The model uses llama.cpp's advanced quantization techniques, including both traditional K-quants and newer I-quants. Each variant is optimized using imatrix calibration, with some versions featuring special Q8_0 quantization for embedding and output weights to maintain higher quality in critical model components.

  • Supports multiple quantization methods (Q2-Q8, IQ2-IQ4)
  • Online weight repacking for ARM and AVX CPU inference
  • Specialized prompt format: "human: {system_prompt} {prompt} assistant:"
  • Optimized for LM Studio and llama.cpp-based projects

Core Capabilities

  • Flexible deployment options from high-quality 41.82GB to compressed 7.39GB variants
  • Hardware-specific optimizations for ARM and AVX architectures
  • Balance between model size and performance through various quantization methods
  • Specialized variants for embedded/output weight preservation

Frequently Asked Questions

Q: What makes this model unique?

The model offers an exceptionally wide range of quantization options, allowing users to precisely balance quality and resource requirements. It incorporates state-of-the-art quantization techniques, including special handling of embedding/output weights and support for both K-quants and I-quants.

Q: What are the recommended use cases?

For maximum quality, use Q6_K_L (18.74GB) or Q5_K_M (15.64GB). For balanced performance, Q4_K_M (13.61GB) is recommended. For systems with limited RAM, consider IQ3_XXS (9.18GB) or IQ2_XXS (7.39GB) which remain surprisingly usable despite high compression.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.