DeepSeek-V3-AWQ

DeepSeek-V3-AWQ

cognitivecomputations

DeepSeek-V3-AWQ is a quantized version of DeepSeek V3, optimized for efficient inference on GPU clusters with AWQ, offering 48 TPS on 8x H100 and 38 TPS on 8x A100 GPUs.

PropertyValue
Authorcognitivecomputations
Model TypeQuantized Language Model
Hugging FaceRepository Link

What is DeepSeek-V3-AWQ?

DeepSeek-V3-AWQ is a quantized version of the DeepSeek V3 language model, specifically optimized using AWQ (Activation-aware Weight Quantization) technology. This version includes modifications to address overflow issues when using float16 precision, making it more stable and efficient for deployment.

Implementation Details

The model has been specifically engineered for high-performance inference, with notable modifications to the codebase to prevent overflow issues in float16 operations. It can be deployed using vLLM with support for Multi-Query Attention (MLA), enabling full context length utilization on 8x 80GB GPU setups.

  • Supports deployment on multiple GPU configurations
  • Modified for stable float16 operations
  • Optimized for vLLM deployment
  • Enables full context length with MLA support

Core Capabilities

  • Achieves 48 TPS on 8x H100 GPUs
  • Delivers 38 TPS on 8x A100 GPUs
  • Supports maximum model length of 65536 tokens
  • Efficient batch processing with up to 65536 batched tokens
  • Superior performance at low batch sizes compared to FP8

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its optimized quantization that maintains high performance while reducing memory requirements. It's particularly effective for low batch size operations, where it outperforms FP8 models.

Q: What are the recommended use cases?

The model is ideal for production deployments requiring efficient inference on GPU clusters, particularly when working with low batch sizes. It's specially suited for applications needing full context length utilization on 8x 80GB GPU setups.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026