Granite-3.1-8B-Instruct
Property | Value |
---|---|
Parameters | 8.1B |
License | Apache 2.0 |
Context Length | 128K tokens |
Release Date | December 18th, 2024 |
Training Tokens | 12T |
What is granite-3.1-8b-instruct?
Granite-3.1-8B-Instruct is an advanced language model developed by IBM's Granite Team, designed for long-context processing and multilingual capabilities. Built on a decoder-only dense transformer architecture, it supports 12 languages and excels in various NLP tasks. The model demonstrates strong performance on the Hugging Face Open LLM Leaderboard, achieving a 71.31% average score across key benchmarks.
Implementation Details
The model utilizes a sophisticated architecture featuring 4096 embedding size, 40 layers, and 32 attention heads with 8 KV heads. It implements GQA and RoPE for position embedding, along with SwiGLU activation in its MLP structure. The model was trained on IBM's Blue Vela supercomputing cluster using NVIDIA H100 GPUs.
- Dense transformer architecture with 8.1B parameters
- 128K token context window
- Trained on 12T tokens
- Implements GQA and RoPE position embedding
- Uses RMSNorm and shared input/output embeddings
Core Capabilities
- Long document summarization and QA
- Text classification and extraction
- Retrieval Augmented Generation (RAG)
- Code-related tasks and function calling
- Multilingual dialogue support
- Advanced summarization capabilities
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its extensive 128K context window, strong multilingual support across 12 languages, and sophisticated architecture combining GQA and RoPE. It achieves impressive benchmark scores and is specifically optimized for long-context tasks.
Q: What are the recommended use cases?
The model excels in business applications requiring long document processing, multilingual support, and complex text analysis. It's particularly suitable for enterprises needing document summarization, QA systems, and RAG implementations.