EXAONE-3.5-2.4B-Instruct
Property | Value |
---|---|
Parameters | 2.14B (without embeddings) |
Context Length | 32,768 tokens |
Architecture | 30 layers, GQA with 32 Q-heads and 8 KV-heads |
Vocabulary Size | 102,400 |
License | EXAONE AI Model License Agreement 1.1 - NC |
Developer | LG AI Research |
What is EXAONE-3.5-2.4B-Instruct?
EXAONE-3.5-2.4B-Instruct is part of the EXAONE 3.5 collection, representing a breakthrough in bilingual (English and Korean) language models. Specifically optimized for deployment on resource-constrained devices, this 2.4B parameter model delivers impressive performance while maintaining efficiency. It's particularly notable for its extended context window of 32K tokens and state-of-the-art performance in real-world applications.
Implementation Details
The model employs a sophisticated architecture with 30 layers and uses Grouped-Query Attention (GQA) with 32 Q-heads and 8 KV-heads. It features tied word embeddings and a substantial vocabulary size of 102,400 tokens, enabling robust language understanding and generation capabilities in both English and Korean.
- Optimized for deployment on small or resource-constrained devices
- Supports long-context processing up to 32K tokens
- Implements efficient GQA attention mechanism
- Compatible with multiple inference frameworks including TensorRT-LLM, vLLM, and SGLang
Core Capabilities
- Bilingual proficiency in English and Korean
- Strong performance in MT-Bench (7.81) and LiveBench (33.0)
- Excellent results in Korean-specific benchmarks (KoMT-Bench: 7.24)
- Supports conversation-style interactions with system prompts
- Quantization options available for optimized deployment
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its optimal balance between size and performance, specifically designed for resource-constrained environments while maintaining high performance in both English and Korean. Its 32K token context window and competitive benchmark scores make it particularly valuable for real-world applications.
Q: What are the recommended use cases?
The model excels in bilingual applications, general language understanding, and generation tasks. It's particularly well-suited for deployment in resource-constrained environments where efficiency is crucial while maintaining high performance standards. The model supports various deployment frameworks and can be quantized for optimized performance.