Falcon3-Mamba-7B-Instruct

Maintained By
tiiuae

Falcon3-Mamba-7B-Instruct

PropertyValue
Parameter Count7 Billion
Context Length32,000 tokens
ArchitectureMamba-based causal decoder
LicenseTII Falcon-LLM License 2.0
Release DateDecember 2024

What is Falcon3-Mamba-7B-Instruct?

Falcon3-Mamba-7B-Instruct is an advanced language model developed by the Technology Innovation Institute (TII) that represents a significant evolution in the Falcon family of models. Built on a Mamba-based architecture, it has been trained on an impressive 1500 Gigatokens of diverse data, including web content, code, and STEM materials, followed by specialized instruction tuning on 1.2 million carefully curated samples.

Implementation Details

The model features a sophisticated architecture with 64 decoder blocks and a width of 4096. It employs a state-size of 16 and supports an extensive context length of 32K tokens, making it particularly suitable for long-form content processing. The model utilizes a vocabulary size of 65k tokens and builds upon the foundation of Falcon-Mamba-7b with additional specialized training.

  • 64 decoder blocks with 4096 width dimension
  • 32K context length capability
  • 65K vocabulary size
  • Mamba-based architecture for efficient processing

Core Capabilities

  • Strong performance in STEM and reasoning tasks
  • Excellent results in language understanding (93.6% on SciQ)
  • Robust instruction following capabilities (71.7% on IFEval)
  • Advanced mathematical reasoning (65.2% on GSM8K)
  • Comprehensive common sense understanding (80.9% on PIQA)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its use of the Mamba architecture combined with extensive training on high-quality STEM and code data. It achieves state-of-the-art results for its size category in reasoning and mathematical tasks, while maintaining strong performance across general language understanding benchmarks.

Q: What are the recommended use cases?

Given its strong performance on STEM, reasoning, and instruction-following tasks, this model is particularly well-suited for educational applications, technical documentation, scientific content generation, and complex problem-solving scenarios. The 32K context length makes it excellent for processing longer documents and maintaining context in extended conversations.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.