Granite-3.0-2B-Base GGUF
Property | Value |
---|---|
Parameter Count | 2.53B |
License | Apache 2.0 |
Architecture | Decoder-only Dense Transformer |
Context Length | 4096 tokens |
Training Tokens | 12 trillion |
What is granite-3.0-2b-base-GGUF?
Granite-3.0-2b-base-GGUF is a quantized version of IBM's Granite language model, designed for efficient text generation tasks. This model represents a significant achievement in balancing performance and efficiency, trained through a sophisticated two-stage process on 12 trillion tokens.
Implementation Details
The model features a decoder-only architecture with 40 layers, 2048 embedding size, and uses advanced techniques including GQA and RoPE positioning. It implements SwiGLU activation in its MLP layers with 8192 hidden size and employs 32 attention heads with 8 KV heads for efficient processing.
- 2048-dimensional embeddings across 40 transformer layers
- Specialized two-stage training approach with diverse data sources
- Supports 12 languages including English, German, Spanish, and Japanese
- Optimized for both general text generation and specific task performance
Core Capabilities
- Strong performance in commonsense tasks (WinoGrande: 74.90%, PIQA: 79.27%)
- Effective reading comprehension (BoolQ: 81.35%)
- Reasonable mathematics capabilities (GSM8K: 47.23%)
- Code generation abilities (HumanEval: 38.41%)
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its efficient architecture and comprehensive training approach, utilizing 10 trillion tokens in the first stage and 2 trillion high-quality tokens in the second stage, resulting in strong performance across various tasks while maintaining a relatively compact size.
Q: What are the recommended use cases?
The model excels in text-to-text generation tasks including summarization, classification, extraction, and question-answering. It's particularly suitable for applications requiring balanced performance across multiple domains and can serve as a foundation for specialized fine-tuning.