Falcon3-10B-Base

Property	Value
Parameter Count	10 Billion
Context Length	32K tokens
Languages	English, French, Spanish, Portuguese
License	TII Falcon-LLM License 2.0
Release Date	December 2024

What is Falcon3-10B-Base?

Falcon3-10B-Base is a state-of-the-art foundation model developed by the Technology Innovation Institute. It represents a significant advancement in the Falcon3 family of Open Foundation Models, trained on 2 Teratokens of diverse datasets including web, code, STEM, and multilingual content. This base model demonstrates exceptional performance in reasoning, language understanding, and mathematical tasks.

Implementation Details

The model utilizes a transformer-based causal decoder-only architecture with 40 decoder blocks. It implements Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, featuring a wider head dimension of 256. The architecture includes advanced components like SwiGLu activation and RMSNorm, with a high RoPE value of 1000042 for enhanced long context understanding.

Vocabulary size of 131K tokens
Trained using 1024 H100 GPU chips
Depth up-scaled from Falcon3-7B-Base
Implements GQA for faster inference

Core Capabilities

Achieves 81.4% accuracy on GSM8K (5-shot)
73.1% accuracy on MMLU (5-shot)
59.7% on BBH (3-shot)
Strong performance in multilingual tasks
Exceptional mathematical reasoning capabilities

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its balanced performance across multiple domains, particularly excelling in mathematical reasoning and STEM tasks. Its architecture innovations, including GQA and high RoPE value, enable efficient processing of long contexts up to 32K tokens.

Q: What are the recommended use cases?

As a base model, it requires further fine-tuning through SFT, RLHF, or continued pretraining for specific applications. It's particularly well-suited for tasks involving mathematical reasoning, multilingual processing, and complex problem-solving scenarios.

Falcon3-10B-Base

Falcon3-10B-Base

What is Falcon3-10B-Base?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models