Athene-V2-Chat-4.65bpw-h6-exl2

Athene-V2-Chat-4.65bpw-h6-exl2

wolfram

Powerful 72B parameter chat model quantized to 4.65 bits-per-weight, rivals GPT-4o performance, supports 32K context, excellent at chat/math/coding tasks

PropertyValue
Base ModelQwen/Qwen2.5-72B-Instruct
LicenseNexusflow Research License
Context Length32K tokens
Quantization4.65 bits per weight
VRAM Requirement48GB

What is Athene-V2-Chat-4.65bpw-h6-exl2?

Athene-V2-Chat-4.65bpw-h6-exl2 is a highly optimized quantized version of the original Athene-V2-Chat model, designed to deliver GPT-4 level performance while maintaining efficient resource usage. Developed by Nexusflow, this model represents a significant advancement in making large language models more accessible and deployable.

Implementation Details

The model is built upon the Qwen2.5-72B-Instruct architecture and has been fine-tuned using RLHF (Reinforcement Learning from Human Feedback). The EXL2 4.65bpw-h6 quantization enables efficient operation with Q4 cache on systems with 48GB VRAM, while maintaining the impressive 32K token context window.

  • Advanced quantization technique using 4.65 bits per weight
  • Optimized for 48GB VRAM systems
  • Maintains full 32K context window capability
  • Compatible with Transformers library

Core Capabilities

  • Exceptional performance in chat interactions
  • Strong mathematical reasoning abilities
  • Advanced coding capabilities
  • Matches GPT-4o across various benchmarks
  • Supports extensive context understanding

Frequently Asked Questions

Q: What makes this model unique?

The model combines state-of-the-art performance comparable to GPT-4o with efficient resource utilization through advanced quantization, making it particularly valuable for research and production deployments requiring high performance within memory constraints.

Q: What are the recommended use cases?

The model excels in chat applications, mathematical problem-solving, and coding tasks. It's particularly well-suited for applications requiring extensive context understanding and complex reasoning, while operating within typical hardware constraints.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026