Falcon3-3B-Instruct

Property	Value
Parameter Count	3 Billion
Context Length	32K tokens
Languages	English, French, Spanish, Portuguese
License	TII Falcon-LLM License 2.0
Release Date	December 2024

What is Falcon3-3B-Instruct?

Falcon3-3B-Instruct is a powerful instruction-tuned language model developed by the Technology Innovation Institute. It's a pruned and optimized version of Falcon3-7B-Base, specifically designed for high-performance reasoning, STEM tasks, and multilingual capabilities. The model was trained on 100 Gigatokens of diverse datasets and further refined with 1.2 million samples of specialized content.

Implementation Details

The model implements a transformer-based causal decoder-only architecture with 22 decoder blocks. It features Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, enabling faster inference. The architecture incorporates SwiGLU activation and RMSNorm, with a high RoPE value of 1000042 for enhanced long-context understanding.

Wide head dimension of 256
131K vocabulary size
32K context length support
Optimized using 1024 H100 GPU chips

Core Capabilities

Strong performance in STEM and mathematical reasoning (78% on GSM8K with Chain-of-Thought)
Exceptional results in scientific understanding (95.5% on SciQ)
Robust multilingual support across four languages
Advanced reasoning capabilities (45.4% on BBH benchmark)
Effective instruction following with 7.2 MT-Bench average score

Frequently Asked Questions

Q: What makes this model unique?

Falcon3-3B-Instruct stands out for its efficient architecture that achieves strong performance despite its relatively compact size. It particularly excels in STEM and scientific tasks, outperforming larger models in specific benchmarks like SciQ and MATH Level-5.

Q: What are the recommended use cases?

The model is particularly well-suited for scientific and mathematical applications, multilingual content generation, and tasks requiring complex reasoning. It's ideal for applications needing strong performance in STEM fields while maintaining moderate computational requirements.