c4ai-command-r-plus-iMat.GGUF

Property	Value
Parameter Count	104B
License	cc-by-nc-4.0
Context Length	131,072 tokens
Architecture	64 layers with importance matrix quantization

What is c4ai-command-r-plus-iMat.GGUF?

c4ai-command-r-plus-iMat.GGUF is a sophisticated quantized version of the C4AI Command R+ model, featuring importance matrix-based compression for optimal performance. This 104B parameter model supports multiple quantization levels, from IQ1 to Q8_0, enabling deployment across various hardware configurations while maintaining impressive perplexity scores.

Implementation Details

The model utilizes GGUF format with importance matrix training on ~100K tokens from wiki.train.raw. It offers multiple quantization options, with perplexity scores ranging from 8.2530 (IQ1_S) to 4.3845 (FP16), allowing users to balance between model size and performance.

Supports context length of 131,072 tokens
Implements BPE pre-tokenization
Features various quantization levels with size ranging from 21.59 GiB to 193.38 GiB
Utilizes importance matrix for enhanced compression efficiency

Core Capabilities

Multilingual support for 10 languages including English, French, Spanish, and others
Advanced Retrieval Augmented Generation (RAG)
Multi-step tool use capabilities
Specialized in reasoning, summarization, and question answering
Supports split model loading for efficient resource management

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its implementation of importance matrix quantization, allowing for highly efficient compression while maintaining performance. It offers various quantization options suitable for different hardware configurations, with some sweet spots being IQ4_XS, IQ3_M/IQ3_S, and IQ2_M for balance between size and performance.

Q: What are the recommended use cases?

The model excels in sophisticated tasks requiring multi-step reasoning, RAG capabilities, and tool use across multiple languages. It's particularly well-suited for applications needing advanced reasoning, summarization, and question answering capabilities while operating under various hardware constraints.