EXAONE-3.5-2.4B-Instruct

Maintained By
LGAI-EXAONE

EXAONE-3.5-2.4B-Instruct

PropertyValue
Parameters2.14B (without embeddings)
Context Length32,768 tokens
Architecture30 layers, GQA with 32 Q-heads and 8 KV-heads
Vocabulary Size102,400
LicenseEXAONE AI Model License Agreement 1.1 - NC
DeveloperLG AI Research

What is EXAONE-3.5-2.4B-Instruct?

EXAONE-3.5-2.4B-Instruct is part of the EXAONE 3.5 collection, representing a breakthrough in bilingual (English and Korean) language models. Specifically optimized for deployment on resource-constrained devices, this 2.4B parameter model delivers impressive performance while maintaining efficiency. It's particularly notable for its extended context window of 32K tokens and state-of-the-art performance in real-world applications.

Implementation Details

The model employs a sophisticated architecture with 30 layers and uses Grouped-Query Attention (GQA) with 32 Q-heads and 8 KV-heads. It features tied word embeddings and a substantial vocabulary size of 102,400 tokens, enabling robust language understanding and generation capabilities in both English and Korean.

  • Optimized for deployment on small or resource-constrained devices
  • Supports long-context processing up to 32K tokens
  • Implements efficient GQA attention mechanism
  • Compatible with multiple inference frameworks including TensorRT-LLM, vLLM, and SGLang

Core Capabilities

  • Bilingual proficiency in English and Korean
  • Strong performance in MT-Bench (7.81) and LiveBench (33.0)
  • Excellent results in Korean-specific benchmarks (KoMT-Bench: 7.24)
  • Supports conversation-style interactions with system prompts
  • Quantization options available for optimized deployment

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its optimal balance between size and performance, specifically designed for resource-constrained environments while maintaining high performance in both English and Korean. Its 32K token context window and competitive benchmark scores make it particularly valuable for real-world applications.

Q: What are the recommended use cases?

The model excels in bilingual applications, general language understanding, and generation tasks. It's particularly well-suited for deployment in resource-constrained environments where efficiency is crucial while maintaining high performance standards. The model supports various deployment frameworks and can be quantized for optimized performance.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.