EXAONE-3.5-2.4B-Instruct

Property	Value
Parameters	2.14B (without embeddings)
Context Length	32,768 tokens
Architecture	30 layers, GQA with 32 Q-heads and 8 KV-heads
Vocabulary Size	102,400
License	EXAONE AI Model License Agreement 1.1 - NC
Developer	LG AI Research

What is EXAONE-3.5-2.4B-Instruct?

EXAONE-3.5-2.4B-Instruct is part of the EXAONE 3.5 collection, representing a breakthrough in bilingual (English and Korean) language models. Specifically optimized for deployment on resource-constrained devices, this 2.4B parameter model delivers impressive performance while maintaining efficiency. It's particularly notable for its extended context window of 32K tokens and state-of-the-art performance in real-world applications.

Implementation Details

The model employs a sophisticated architecture with 30 layers and uses Grouped-Query Attention (GQA) with 32 Q-heads and 8 KV-heads. It features tied word embeddings and a substantial vocabulary size of 102,400 tokens, enabling robust language understanding and generation capabilities in both English and Korean.

Optimized for deployment on small or resource-constrained devices
Supports long-context processing up to 32K tokens
Implements efficient GQA attention mechanism
Compatible with multiple inference frameworks including TensorRT-LLM, vLLM, and SGLang

Core Capabilities

Bilingual proficiency in English and Korean
Strong performance in MT-Bench (7.81) and LiveBench (33.0)
Excellent results in Korean-specific benchmarks (KoMT-Bench: 7.24)
Supports conversation-style interactions with system prompts
Quantization options available for optimized deployment

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its optimal balance between size and performance, specifically designed for resource-constrained environments while maintaining high performance in both English and Korean. Its 32K token context window and competitive benchmark scores make it particularly valuable for real-world applications.

Q: What are the recommended use cases?

The model excels in bilingual applications, general language understanding, and generation tasks. It's particularly well-suited for deployment in resource-constrained environments where efficiency is crucial while maintaining high performance standards. The model supports various deployment frameworks and can be quantized for optimized performance.