EXAONE-Deep-2.4B-GGUF

Property	Value
Parameters	2.14B
Context Length	32,768 tokens
License	EXAONE AI Model License Agreement 1.1 - NC
Architecture	30 layers, GQA with 32 Q-heads and 8 KV-heads
Vocabulary Size	102,400

What is EXAONE-Deep-2.4B-GGUF?

EXAONE-Deep-2.4B-GGUF is an advanced language model developed by LG AI Research, specifically designed for superior reasoning capabilities in mathematics and coding tasks. This model represents a significant achievement in balancing model size and performance, offering impressive capabilities in a relatively compact 2.4B parameter package.

Implementation Details

The model features a sophisticated architecture utilizing Grouped-Query Attention (GQA) with 32 query heads and 8 key-value heads, spread across 30 layers. It supports an extensive context window of 32,768 tokens and implements tied word embeddings, distinguishing it from its larger siblings in the EXAONE family.

Multiple quantization options including Q8_0, Q6_K, Q5_K_M, Q4_K_M, and IQ4_XS in GGUF format
Extensive vocabulary size of 102,400 tokens
Optimized for deployment across various frameworks including TensorRT-LLM, vLLM, and llama.cpp

Core Capabilities

Enhanced reasoning abilities for mathematical problems
Strong performance in coding tasks
Structured thought process with tags
Competitive performance against larger models
Efficient deployment options across multiple frameworks

Frequently Asked Questions

Q: What makes this model unique?

EXAONE-Deep-2.4B stands out for its exceptional reasoning capabilities despite its relatively small size, outperforming comparable models in its class. The implementation of GQA attention and extensive context window make it particularly effective for complex reasoning tasks.

Q: What are the recommended use cases?

The model excels in mathematical reasoning and coding tasks. It's particularly effective when used with structured prompts that include step-by-step reasoning instructions, especially for math problems where using \boxed{} notation for final answers is recommended. The model performs best when initialized with proper thought processes and minimal system prompts.