EXAONE-Deep-7.8B
Property | Value |
---|---|
Parameter Count | 7.8B (6.98B without embeddings) |
Context Length | 32,768 tokens |
Architecture | 32 layers, GQA with 32 Q-heads and 8 KV-heads |
License | EXAONE AI Model License Agreement 1.1 - NC |
Vocabulary Size | 102,400 |
What is EXAONE-Deep-7.8B?
EXAONE-Deep-7.8B is an advanced language model developed by LG AI Research, specifically designed to excel in reasoning tasks including mathematics and coding. The model represents a significant achievement in balancing size and performance, outperforming both open-weight models of comparable scale and proprietary models like OpenAI's o1-mini.
Implementation Details
The model implements a sophisticated architecture featuring 32 layers with Grouped-Query Attention (GQA), utilizing 32 query heads and 8 key-value heads. With a context length of 32,768 tokens and a vocabulary size of 102,400, it offers robust capabilities for handling complex tasks.
- Advanced reasoning capabilities optimized for mathematical and coding problems
- Extensive context window supporting long-form reasoning
- Efficient architecture with GQA implementation
- Comprehensive vocabulary for diverse task handling
Core Capabilities
- Achieves 94.8% accuracy on MATH-500 benchmark
- 70.0% pass rate on AIME 2024 with 83.3% consistency
- 89.9% accuracy on CSAT Math 2025
- Strong performance in coding tasks with 55.2% pass rate on Live Code Bench
- Supports various deployment frameworks including TensorRT-LLM, vLLM, and SGLang
Frequently Asked Questions
Q: What makes this model unique?
EXAONE-Deep-7.8B stands out for its exceptional reasoning capabilities despite its moderate size, offering performance that competes with larger models while maintaining efficiency. Its specialized architecture and training make it particularly effective for mathematical and scientific reasoning tasks.
Q: What are the recommended use cases?
The model excels in mathematical problem-solving, coding tasks, and complex reasoning scenarios. It's particularly well-suited for educational applications, technical problem-solving, and situations requiring detailed step-by-step reasoning. The model performs best when prompts include structured reasoning requests and clear instruction patterns.