Granite-3.1-2B-Instruct
Property | Value |
---|---|
Developer | IBM Granite Team |
License | Apache 2.0 |
Release Date | December 18th, 2024 |
Parameters | 2.5B |
Context Length | 128K tokens |
Languages | 12 languages including English, German, Spanish, etc. |
What is granite-3.1-2b-instruct?
Granite-3.1-2B-Instruct is an advanced language model developed by IBM's Granite team, featuring a 2.5B parameter architecture optimized for long-context understanding and generation. Built on a decoder-only dense transformer architecture, it incorporates modern innovations like GQA, RoPE, and SwiGLU activations, enabling robust performance across multiple languages and tasks.
Implementation Details
The model utilizes a sophisticated architecture with 40 layers, 2048 embedding size, and 32 attention heads (8 KV heads). It's trained on 12T tokens using IBM's Blue Vela supercomputing cluster with H100 GPUs, implementing advanced features like shared input/output embeddings and RMSNorm for optimal performance.
- Embedding size: 2048
- Number of layers: 40
- Attention heads: 32
- MLP hidden size: 8192
- Sequence length: 128K
Core Capabilities
- Long-context document summarization and QA
- Text classification and extraction
- Multilingual dialogue processing
- Code-related tasks
- Function-calling capabilities
- Retrieval Augmented Generation (RAG)
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its impressive 128K context window and balanced architecture optimized for both performance and efficiency. It achieves strong results across various benchmarks, scoring 60.79 on average in the HuggingFace Open LLM Leaderboard V1.
Q: What are the recommended use cases?
The model excels in business applications requiring long-context understanding, multilingual capabilities, and general instruction following. It's particularly effective for document processing, summarization, and complex Q&A tasks across multiple languages.