Granite-3.1-2B-Instruct

Property	Value
Developer	IBM Granite Team
License	Apache 2.0
Release Date	December 18th, 2024
Parameters	2.5B
Context Length	128K tokens
Languages	12 languages including English, German, Spanish, etc.

What is granite-3.1-2b-instruct?

Granite-3.1-2B-Instruct is an advanced language model developed by IBM's Granite team, featuring a 2.5B parameter architecture optimized for long-context understanding and generation. Built on a decoder-only dense transformer architecture, it incorporates modern innovations like GQA, RoPE, and SwiGLU activations, enabling robust performance across multiple languages and tasks.

Implementation Details

The model utilizes a sophisticated architecture with 40 layers, 2048 embedding size, and 32 attention heads (8 KV heads). It's trained on 12T tokens using IBM's Blue Vela supercomputing cluster with H100 GPUs, implementing advanced features like shared input/output embeddings and RMSNorm for optimal performance.

Embedding size: 2048
Number of layers: 40
Attention heads: 32
MLP hidden size: 8192
Sequence length: 128K

Core Capabilities

Long-context document summarization and QA
Text classification and extraction
Multilingual dialogue processing
Code-related tasks
Function-calling capabilities
Retrieval Augmented Generation (RAG)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its impressive 128K context window and balanced architecture optimized for both performance and efficiency. It achieves strong results across various benchmarks, scoring 60.79 on average in the HuggingFace Open LLM Leaderboard V1.

Q: What are the recommended use cases?

The model excels in business applications requiring long-context understanding, multilingual capabilities, and general instruction following. It's particularly effective for document processing, summarization, and complex Q&A tasks across multiple languages.