glm-4-9b-chat-1m-GGUF

Maintained By
bartowski

GLM-4-9B Chat Model (GGUF)

PropertyValue
Parameter Count9.48B
LicenseGLM-4
LanguagesChinese, English
Base ModelTHUDM/glm-4-9b-chat-1m

What is glm-4-9b-chat-1m-GGUF?

This is a quantized version of the GLM-4 9B chat model, optimized for efficient deployment across various hardware configurations. The model has been converted to GGUF format using llama.cpp, offering multiple quantization options ranging from 3.96GB to 18.97GB in size.

Implementation Details

The model features various quantization levels optimized using imatrix calibration, with specific formats designed for different use cases and hardware constraints. The implementation includes special considerations for embed/output weights in certain variants to maintain quality while reducing size.

  • Multiple quantization options (Q2 to Q8_0)
  • Specialized formats for ARM and CPU inference
  • Optimized versions for different RAM/VRAM configurations
  • Custom prompt format for system and user interactions

Core Capabilities

  • Bilingual chat functionality (Chinese and English)
  • Efficient memory usage through various quantization options
  • Optimized performance on different hardware configurations
  • Support for system prompts and user interactions

Frequently Asked Questions

Q: What makes this model unique?

The model offers exceptional flexibility through multiple quantization options, allowing users to balance between model size and performance based on their hardware capabilities. It's particularly notable for its bilingual capabilities and optimized GGUF format implementation.

Q: What are the recommended use cases?

For users with limited VRAM (4-6GB), the Q4_K_M or Q4_K_S variants are recommended for optimal performance. Users with more powerful hardware can utilize higher quality quantizations like Q6_K_L for near-perfect performance. The model is particularly suitable for bilingual applications requiring both Chinese and English language capabilities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.