EXAONE-3.5-2.4B-Instruct

EXAONE-3.5-2.4B-Instruct

LGAI-EXAONE

EXAONE-3.5-2.4B-Instruct is a bilingual (English/Korean) LLM with 2.4B parameters, 32K context window, and state-of-the-art performance in real-world tasks.

PropertyValue
Parameters2.14B (without embeddings)
Context Length32,768 tokens
Architecture30 layers, GQA with 32 Q-heads and 8 KV-heads
Vocabulary Size102,400
LicenseEXAONE AI Model License Agreement 1.1 - NC
DeveloperLG AI Research

What is EXAONE-3.5-2.4B-Instruct?

EXAONE-3.5-2.4B-Instruct is part of the EXAONE 3.5 collection, representing a breakthrough in bilingual (English and Korean) language models. Specifically optimized for deployment on resource-constrained devices, this 2.4B parameter model delivers impressive performance while maintaining efficiency. It's particularly notable for its extended context window of 32K tokens and state-of-the-art performance in real-world applications.

Implementation Details

The model employs a sophisticated architecture with 30 layers and uses Grouped-Query Attention (GQA) with 32 Q-heads and 8 KV-heads. It features tied word embeddings and a substantial vocabulary size of 102,400 tokens, enabling robust language understanding and generation capabilities in both English and Korean.

  • Optimized for deployment on small or resource-constrained devices
  • Supports long-context processing up to 32K tokens
  • Implements efficient GQA attention mechanism
  • Compatible with multiple inference frameworks including TensorRT-LLM, vLLM, and SGLang

Core Capabilities

  • Bilingual proficiency in English and Korean
  • Strong performance in MT-Bench (7.81) and LiveBench (33.0)
  • Excellent results in Korean-specific benchmarks (KoMT-Bench: 7.24)
  • Supports conversation-style interactions with system prompts
  • Quantization options available for optimized deployment

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its optimal balance between size and performance, specifically designed for resource-constrained environments while maintaining high performance in both English and Korean. Its 32K token context window and competitive benchmark scores make it particularly valuable for real-world applications.

Q: What are the recommended use cases?

The model excels in bilingual applications, general language understanding, and generation tasks. It's particularly well-suited for deployment in resource-constrained environments where efficiency is crucial while maintaining high performance standards. The model supports various deployment frameworks and can be quantized for optimized performance.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026