FalconLite

Maintained By
amazon

FalconLite

PropertyValue
DeveloperAWS Contributors
Base ModelFalcon 40B
LicenseApache 2.0
Maximum Context Length11K tokens

What is FalconLite?

FalconLite is an optimized, quantized version of the Falcon 40B SFT OASST-TOP1 model, designed to handle long-form content while significantly reducing GPU memory requirements. Through innovative 4-bit GPTQ quantization and adapted dynamic NTK RotaryEmbedding, it achieves a 4x reduction in memory usage while maintaining impressive performance capabilities.

Implementation Details

The model leverages advanced quantization techniques to optimize performance and can be deployed on a single AWS g5.12x instance using TGI 0.9.2. It demonstrates remarkable efficiency in processing sequences up to 11K tokens, making it particularly suitable for resource-constrained environments.

  • 4-bit GPTQ quantization for memory efficiency
  • Dynamic NTK RotaryEmbedding for enhanced performance
  • Deployable on AWS g5.12x instances
  • Compatible with Text-Generation-Inference 0.9.2

Core Capabilities

  • 100% accuracy in Pass Key Retrieval across all input lengths
  • 92-100% accuracy in Topic Retrieval for contexts up to 11K tokens
  • 46.9% accuracy in Question Answering with Long Input Texts
  • Processing speed of 10-18 tokens per second depending on input length

Frequently Asked Questions

Q: What makes this model unique?

FalconLite stands out for its ability to process 5x longer contexts than the original model while using 4x less GPU memory, making it highly efficient for resource-constrained deployments.

Q: What are the recommended use cases?

The model excels in applications requiring long-context processing such as topic retrieval, summarization, and question-answering. It's particularly suited for scenarios where memory efficiency is crucial while maintaining high performance.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.