Granite-3.1-8B-Instruct

Property	Value
Parameters	8.1B
License	Apache 2.0
Context Length	128K tokens
Release Date	December 18th, 2024
Training Tokens	12T

What is granite-3.1-8b-instruct?

Granite-3.1-8B-Instruct is an advanced language model developed by IBM's Granite Team, designed for long-context processing and multilingual capabilities. Built on a decoder-only dense transformer architecture, it supports 12 languages and excels in various NLP tasks. The model demonstrates strong performance on the Hugging Face Open LLM Leaderboard, achieving a 71.31% average score across key benchmarks.

Implementation Details

The model utilizes a sophisticated architecture featuring 4096 embedding size, 40 layers, and 32 attention heads with 8 KV heads. It implements GQA and RoPE for position embedding, along with SwiGLU activation in its MLP structure. The model was trained on IBM's Blue Vela supercomputing cluster using NVIDIA H100 GPUs.

Dense transformer architecture with 8.1B parameters
128K token context window
Trained on 12T tokens
Implements GQA and RoPE position embedding
Uses RMSNorm and shared input/output embeddings

Core Capabilities

Long document summarization and QA
Text classification and extraction
Retrieval Augmented Generation (RAG)
Code-related tasks and function calling
Multilingual dialogue support
Advanced summarization capabilities

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its extensive 128K context window, strong multilingual support across 12 languages, and sophisticated architecture combining GQA and RoPE. It achieves impressive benchmark scores and is specifically optimized for long-context tasks.

Q: What are the recommended use cases?

The model excels in business applications requiring long document processing, multilingual support, and complex text analysis. It's particularly suitable for enterprises needing document summarization, QA systems, and RAG implementations.