Falcon3-3B-Instruct
Property | Value |
---|---|
Parameter Count | 3 Billion |
Context Length | 32K tokens |
Languages | English, French, Spanish, Portuguese |
License | TII Falcon-LLM License 2.0 |
Release Date | December 2024 |
What is Falcon3-3B-Instruct?
Falcon3-3B-Instruct is a powerful instruction-tuned language model developed by the Technology Innovation Institute. It's a pruned and optimized version of Falcon3-7B-Base, specifically designed for high-performance reasoning, STEM tasks, and multilingual capabilities. The model was trained on 100 Gigatokens of diverse datasets and further refined with 1.2 million samples of specialized content.
Implementation Details
The model implements a transformer-based causal decoder-only architecture with 22 decoder blocks. It features Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, enabling faster inference. The architecture incorporates SwiGLU activation and RMSNorm, with a high RoPE value of 1000042 for enhanced long-context understanding.
- Wide head dimension of 256
- 131K vocabulary size
- 32K context length support
- Optimized using 1024 H100 GPU chips
Core Capabilities
- Strong performance in STEM and mathematical reasoning (78% on GSM8K with Chain-of-Thought)
- Exceptional results in scientific understanding (95.5% on SciQ)
- Robust multilingual support across four languages
- Advanced reasoning capabilities (45.4% on BBH benchmark)
- Effective instruction following with 7.2 MT-Bench average score
Frequently Asked Questions
Q: What makes this model unique?
Falcon3-3B-Instruct stands out for its efficient architecture that achieves strong performance despite its relatively compact size. It particularly excels in STEM and scientific tasks, outperforming larger models in specific benchmarks like SciQ and MATH Level-5.
Q: What are the recommended use cases?
The model is particularly well-suited for scientific and mathematical applications, multilingual content generation, and tasks requiring complex reasoning. It's ideal for applications needing strong performance in STEM fields while maintaining moderate computational requirements.