Falcon3-10B-Base

Maintained By
tiiuae

Falcon3-10B-Base

PropertyValue
Parameter Count10 Billion
Context Length32K tokens
LanguagesEnglish, French, Spanish, Portuguese
LicenseTII Falcon-LLM License 2.0
Release DateDecember 2024

What is Falcon3-10B-Base?

Falcon3-10B-Base is a state-of-the-art foundation model developed by the Technology Innovation Institute. It represents a significant advancement in the Falcon3 family of Open Foundation Models, trained on 2 Teratokens of diverse datasets including web, code, STEM, and multilingual content. This base model demonstrates exceptional performance in reasoning, language understanding, and mathematical tasks.

Implementation Details

The model utilizes a transformer-based causal decoder-only architecture with 40 decoder blocks. It implements Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, featuring a wider head dimension of 256. The architecture includes advanced components like SwiGLu activation and RMSNorm, with a high RoPE value of 1000042 for enhanced long context understanding.

  • Vocabulary size of 131K tokens
  • Trained using 1024 H100 GPU chips
  • Depth up-scaled from Falcon3-7B-Base
  • Implements GQA for faster inference

Core Capabilities

  • Achieves 81.4% accuracy on GSM8K (5-shot)
  • 73.1% accuracy on MMLU (5-shot)
  • 59.7% on BBH (3-shot)
  • Strong performance in multilingual tasks
  • Exceptional mathematical reasoning capabilities

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its balanced performance across multiple domains, particularly excelling in mathematical reasoning and STEM tasks. Its architecture innovations, including GQA and high RoPE value, enable efficient processing of long contexts up to 32K tokens.

Q: What are the recommended use cases?

As a base model, it requires further fine-tuning through SFT, RLHF, or continued pretraining for specific applications. It's particularly well-suited for tasks involving mathematical reasoning, multilingual processing, and complex problem-solving scenarios.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.