Granite-3.0-3B-A800M-Base-GGUF

Property	Value
Parameter Count	3.3B (800M active)
Model Type	Decoder-only Sparse MoE
License	Apache 2.0
Training Data	10T tokens
Supported Languages	12 languages including English, German, Spanish, etc.

What is granite-3.0-3b-a800m-base-GGUF?

Granite-3.0-3B-A800M-Base-GGUF is a quantized version of IBM's Granite language model, implementing a sophisticated Mixture of Experts (MoE) architecture. This model represents a significant advancement in efficient language modeling, utilizing 40 experts with only 800M active parameters while maintaining the capability of a much larger 3.3B parameter model.

Implementation Details

The model features a decoder-only architecture with 32 layers, 24 attention heads, and an embedding size of 1536. It employs key technological innovations including Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss. The model underwent a two-stage training process on 10T tokens, focusing first on broad knowledge acquisition and then on task-specific enhancement.

Embedding size: 1536
Attention heads: 24 (8 KV heads)
Number of experts: 40
Sequence length: 4096
Position embedding: RoPE

Core Capabilities

Strong performance in text generation tasks (72.79% on Hellaswag)
Robust multilingual support across 12 languages
Effective in reasoning tasks (38.93% on BBH)
Capable of code generation (26.83% on HumanEval)
Mathematical problem-solving (35.86% on GSM8K)

Frequently Asked Questions

Q: What makes this model unique?

Its sparse MoE architecture allows it to achieve the performance of larger models while using significantly fewer active parameters (800M vs 3.3B total), making it more efficient for deployment.

Q: What are the recommended use cases?

The model excels in text-to-text generation tasks including summarization, classification, extraction, and question-answering. It's particularly suitable for multilingual applications and can serve as a foundation for specialized fine-tuning.