chatglm-6b-int4

Maintained By
THUDM

ChatGLM-6B-INT4

PropertyValue
Parameters6 Billion
LicenseApache-2.0 (code), Custom Model License (weights)
PaperarXiv:2406.12793
LanguagesChinese, English

What is chatglm-6b-int4?

ChatGLM-6B-INT4 is a quantized version of the ChatGLM-6B model, specifically designed for efficient deployment on consumer-grade hardware. This bilingual language model features INT4 quantization of its 28 GLM Blocks, enabling inference with just 6GB of VRAM or CPU memory, making it accessible for deployment even on embedded devices like Raspberry Pi.

Implementation Details

The model is built on the General Language Model (GLM) architecture and has undergone extensive training with approximately 1T tokens in both Chinese and English. The quantization process specifically targets the GLM Blocks while preserving the Embedding and LM Head layers in their original precision.

  • INT4 quantization for optimal memory efficiency
  • Automatic CPU kernel compilation with OpenMP support
  • Compatible with both GPU and CPU inference
  • Requires minimal dependencies (protobuf, transformers 4.27.1, cpm_kernels)

Core Capabilities

  • Bilingual dialogue generation in Chinese and English
  • Human-like responses through supervised fine-tuning
  • Efficient memory usage (6GB VRAM minimum)
  • Optimized for consumer-grade hardware
  • Supports interactive chat with conversation history

Frequently Asked Questions

Q: What makes this model unique?

The model's INT4 quantization makes it exceptionally efficient in terms of memory usage while maintaining reasonable performance, enabling deployment on consumer hardware and embedded devices - a feature not common among 6B parameter models.

Q: What are the recommended use cases?

The model is ideal for deployment scenarios where computing resources are limited, such as local deployment on consumer PCs, embedded systems, or edge devices. It's particularly useful for bilingual applications requiring Chinese-English dialogue capabilities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.