long_llama_3b

Maintained By
syzymon

LongLLaMA 3B

PropertyValue
Parameter Count3.43B parameters
LicenseApache 2.0
Research PaperFocused Transformer: Contrastive Training for Context Scaling
Training DataRedPajama-Data-1T
Context LengthUp to 256k tokens

What is long_llama_3b?

LongLLaMA 3B is an innovative language model that pushes the boundaries of context length handling in transformer architectures. Built upon OpenLLaMA and fine-tuned using the Focused Transformer (FoT) method, this model can process inputs of up to 256,000 tokens or more, significantly exceeding traditional context limitations.

Implementation Details

The model implements the Focused Transformer architecture with three specific memory layers (6, 12, and 18) for context extension. It utilizes a unique contrastive training approach where memory attention layers are exposed to both relevant and irrelevant keys, enabling better semantic differentiation and context length extrapolation.

  • Built on OpenLLaMA 3B base model
  • Trained on 1T tokens (base) + 10B tokens (fine-tuning)
  • Supports both F32 and BF16 tensor types
  • Implements automatic context window splitting for long inputs

Core Capabilities

  • Handles extremely long context lengths (up to 256k tokens)
  • Maintains performance parity with original OpenLLaMA on standard benchmarks
  • Improved performance on long-context tasks like TREC and WebQS
  • Drop-in replacement for standard LLaMA implementations

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to handle extremely long contexts through the Focused Transformer architecture, while maintaining performance on standard tasks. It achieves this without requiring training on full-length sequences.

Q: What are the recommended use cases?

The model excels in tasks requiring long context processing, such as document analysis, long-form text generation, and question-answering over extended contexts. It's particularly useful for applications needing to process or generate text beyond traditional context windows.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.