Yarn-Llama-2-70b-32k-2.4bpw-h6-exl2
Property | Value |
---|---|
Base Model | LLaMA-2-70B |
Context Window | 32,000 tokens |
License | Apache 2.0 |
Paper | arXiv:2309.00071 |
What is Yarn-Llama-2-70b-32k-2.4bpw-h6-exl2?
This is a state-of-the-art language model that extends the capabilities of LLaMA-2-70B with significantly improved context handling. It has been further pretrained for 400 steps using the YaRN extension method, enabling it to process up to 32,000 tokens of context - a substantial improvement over the original 4,000 token limit.
Implementation Details
The model requires specific implementation parameters, including the use of Flash Attention 2 and bfloat16 precision. It was trained on the JUWELS supercomputer with support from LAION AI. The model demonstrates impressive perplexity metrics across various context lengths, from 3.61 at 1k tokens to 2.23 at 32k tokens.
- Requires trust_remote_code=True parameter
- Utilizes Flash Attention 2 for efficient processing
- Implements bfloat16 precision for optimal performance
- Compatible with the latest transformers library
Core Capabilities
- Extended context window of 32k tokens
- Maintained performance on standard benchmarks (ARC-c: 67.41, MMLU: 68.84)
- Improved long-context processing with minimal quality degradation
- Enhanced truthful QA capabilities (46.14 vs base model's 44.92)
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its ability to handle extremely long contexts (32k tokens) while maintaining or improving performance across various benchmarks compared to the base LLaMA-2-70B model. This makes it particularly suitable for tasks requiring extensive context processing.
Q: What are the recommended use cases?
The model is ideal for applications requiring long-form content analysis, document processing, and complex reasoning tasks that benefit from extended context windows. It's particularly well-suited for tasks like document summarization, long-form question answering, and analysis of extensive text passages.