RekaAI_reka-flash-3-GGUF
Property | Value |
---|---|
Author | bartowski |
Original Model | RekaAI/reka-flash-3 |
Quantization Framework | llama.cpp (b4867) |
Size Range | 7.39GB - 41.82GB |
What is RekaAI_reka-flash-3-GGUF?
RekaAI_reka-flash-3-GGUF is a comprehensive collection of quantized versions of the reka-flash-3 model, optimized for different hardware configurations and memory constraints. The collection includes 27 different quantization variants, ranging from full BF16 precision to highly compressed IQ2 formats.
Implementation Details
The model uses llama.cpp's advanced quantization techniques, including both traditional K-quants and newer I-quants. Each variant is optimized using imatrix calibration, with some versions featuring special Q8_0 quantization for embedding and output weights to maintain higher quality in critical model components.
- Supports multiple quantization methods (Q2-Q8, IQ2-IQ4)
- Online weight repacking for ARM and AVX CPU inference
- Specialized prompt format: "human: {system_prompt} {prompt}
assistant:" - Optimized for LM Studio and llama.cpp-based projects
Core Capabilities
- Flexible deployment options from high-quality 41.82GB to compressed 7.39GB variants
- Hardware-specific optimizations for ARM and AVX architectures
- Balance between model size and performance through various quantization methods
- Specialized variants for embedded/output weight preservation
Frequently Asked Questions
Q: What makes this model unique?
The model offers an exceptionally wide range of quantization options, allowing users to precisely balance quality and resource requirements. It incorporates state-of-the-art quantization techniques, including special handling of embedding/output weights and support for both K-quants and I-quants.
Q: What are the recommended use cases?
For maximum quality, use Q6_K_L (18.74GB) or Q5_K_M (15.64GB). For balanced performance, Q4_K_M (13.61GB) is recommended. For systems with limited RAM, consider IQ3_XXS (9.18GB) or IQ2_XXS (7.39GB) which remain surprisingly usable despite high compression.