Falcon3-10B-Base
Property | Value |
---|---|
Parameter Count | 10 Billion |
Context Length | 32K tokens |
Languages | English, French, Spanish, Portuguese |
License | TII Falcon-LLM License 2.0 |
Release Date | December 2024 |
What is Falcon3-10B-Base?
Falcon3-10B-Base is a state-of-the-art foundation model developed by the Technology Innovation Institute. It represents a significant advancement in the Falcon3 family of Open Foundation Models, trained on 2 Teratokens of diverse datasets including web, code, STEM, and multilingual content. This base model demonstrates exceptional performance in reasoning, language understanding, and mathematical tasks.
Implementation Details
The model utilizes a transformer-based causal decoder-only architecture with 40 decoder blocks. It implements Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, featuring a wider head dimension of 256. The architecture includes advanced components like SwiGLu activation and RMSNorm, with a high RoPE value of 1000042 for enhanced long context understanding.
- Vocabulary size of 131K tokens
- Trained using 1024 H100 GPU chips
- Depth up-scaled from Falcon3-7B-Base
- Implements GQA for faster inference
Core Capabilities
- Achieves 81.4% accuracy on GSM8K (5-shot)
- 73.1% accuracy on MMLU (5-shot)
- 59.7% on BBH (3-shot)
- Strong performance in multilingual tasks
- Exceptional mathematical reasoning capabilities
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its balanced performance across multiple domains, particularly excelling in mathematical reasoning and STEM tasks. Its architecture innovations, including GQA and high RoPE value, enable efficient processing of long contexts up to 32K tokens.
Q: What are the recommended use cases?
As a base model, it requires further fine-tuning through SFT, RLHF, or continued pretraining for specific applications. It's particularly well-suited for tasks involving mathematical reasoning, multilingual processing, and complex problem-solving scenarios.