ChatGLM-6B-INT4
Property | Value |
---|---|
Parameters | 6 Billion |
License | Apache-2.0 (code), Custom Model License (weights) |
Paper | arXiv:2406.12793 |
Languages | Chinese, English |
What is chatglm-6b-int4?
ChatGLM-6B-INT4 is a quantized version of the ChatGLM-6B model, specifically designed for efficient deployment on consumer-grade hardware. This bilingual language model features INT4 quantization of its 28 GLM Blocks, enabling inference with just 6GB of VRAM or CPU memory, making it accessible for deployment even on embedded devices like Raspberry Pi.
Implementation Details
The model is built on the General Language Model (GLM) architecture and has undergone extensive training with approximately 1T tokens in both Chinese and English. The quantization process specifically targets the GLM Blocks while preserving the Embedding and LM Head layers in their original precision.
- INT4 quantization for optimal memory efficiency
- Automatic CPU kernel compilation with OpenMP support
- Compatible with both GPU and CPU inference
- Requires minimal dependencies (protobuf, transformers 4.27.1, cpm_kernels)
Core Capabilities
- Bilingual dialogue generation in Chinese and English
- Human-like responses through supervised fine-tuning
- Efficient memory usage (6GB VRAM minimum)
- Optimized for consumer-grade hardware
- Supports interactive chat with conversation history
Frequently Asked Questions
Q: What makes this model unique?
The model's INT4 quantization makes it exceptionally efficient in terms of memory usage while maintaining reasonable performance, enabling deployment on consumer hardware and embedded devices - a feature not common among 6B parameter models.
Q: What are the recommended use cases?
The model is ideal for deployment scenarios where computing resources are limited, such as local deployment on consumer PCs, embedded systems, or edge devices. It's particularly useful for bilingual applications requiring Chinese-English dialogue capabilities.