Qwen-7B-Chat-Int4

Qwen-7B-Chat-Int4

Qwen

Qwen-7B-Chat-Int4 is a 4-bit quantized version of the Qwen-7B-Chat model, offering efficient inference with 2.11B parameters while maintaining strong performance across multiple languages and tasks.

PropertyValue
Parameter Count2.11B parameters
Model TypeQuantized Chat Model
Architecture32 layers, 32 heads, 4096 d_model
LicenseTongyi Qianwen License Agreement
Supported LanguagesChinese, English, Multi-lingual

What is Qwen-7B-Chat-Int4?

Qwen-7B-Chat-Int4 is a 4-bit quantized version of the Qwen-7B-Chat model, designed for efficient deployment while maintaining impressive performance. The model is built on a Transformer architecture and has been trained on diverse datasets including web texts, professional books, and code repositories. This quantized version significantly reduces memory usage while preserving most of the original model's capabilities.

Implementation Details

The model implements advanced technical features including RoPE relative position encoding, SwiGLU activation functions, and RMSNorm. It uses a vocabulary of approximately 150K tokens optimized for Chinese, English, and code, built upon GPT-4's BPE vocabulary base cl100k_base.

  • Architecture: 32 layers, 32 attention heads, 4096 dimension model
  • Context Length: 8192 tokens
  • Memory Usage: 8.21GB for encoding 2048 tokens
  • Inference Speed: 50.09 tokens/s for 2048 tokens with Flash Attention v2

Core Capabilities

  • Strong performance in Chinese (59.7% on C-Eval) and English (55.8% on MMLU) evaluations
  • Code generation capabilities with 37.2% Pass@1 on HumanEval
  • Mathematical reasoning with 50.3% accuracy on GSM8K
  • Tool usage and ReAct prompting support
  • Efficient inference with reduced memory footprint

Frequently Asked Questions

Q: What makes this model unique?

The model combines efficient 4-bit quantization with strong multi-lingual capabilities and tool usage abilities, making it particularly suitable for deployment in resource-constrained environments while maintaining high performance.

Q: What are the recommended use cases?

The model excels in multi-lingual chat applications, code generation, mathematical problem-solving, and tool-augmented tasks. It's particularly suitable for deployment scenarios where memory efficiency is crucial.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026