Yarn-Llama-2-70b-32k-2.4bpw-h6-exl2

Maintained By
LoneStriker

Yarn-Llama-2-70b-32k-2.4bpw-h6-exl2

PropertyValue
Base ModelLLaMA-2-70B
Context Window32,000 tokens
LicenseApache 2.0
PaperarXiv:2309.00071

What is Yarn-Llama-2-70b-32k-2.4bpw-h6-exl2?

This is a state-of-the-art language model that extends the capabilities of LLaMA-2-70B with significantly improved context handling. It has been further pretrained for 400 steps using the YaRN extension method, enabling it to process up to 32,000 tokens of context - a substantial improvement over the original 4,000 token limit.

Implementation Details

The model requires specific implementation parameters, including the use of Flash Attention 2 and bfloat16 precision. It was trained on the JUWELS supercomputer with support from LAION AI. The model demonstrates impressive perplexity metrics across various context lengths, from 3.61 at 1k tokens to 2.23 at 32k tokens.

  • Requires trust_remote_code=True parameter
  • Utilizes Flash Attention 2 for efficient processing
  • Implements bfloat16 precision for optimal performance
  • Compatible with the latest transformers library

Core Capabilities

  • Extended context window of 32k tokens
  • Maintained performance on standard benchmarks (ARC-c: 67.41, MMLU: 68.84)
  • Improved long-context processing with minimal quality degradation
  • Enhanced truthful QA capabilities (46.14 vs base model's 44.92)

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to handle extremely long contexts (32k tokens) while maintaining or improving performance across various benchmarks compared to the base LLaMA-2-70B model. This makes it particularly suitable for tasks requiring extensive context processing.

Q: What are the recommended use cases?

The model is ideal for applications requiring long-form content analysis, document processing, and complex reasoning tasks that benefit from extended context windows. It's particularly well-suited for tasks like document summarization, long-form question answering, and analysis of extensive text passages.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.