DeepSeek-R1-Distill-Qwen-14B-FP8-dynamic

Maintained By
neuralmagic

DeepSeek-R1-Distill-Qwen-14B-FP8-dynamic

PropertyValue
Model TypeQwen2ForCausalLM
DeveloperNeural Magic
Release DateFebruary 5, 2025
QuantizationFP8 (Weights & Activations)
Model URLHugging Face Repository

What is DeepSeek-R1-Distill-Qwen-14B-FP8-dynamic?

This is an optimized version of the DeepSeek-R1-Distill-Qwen-14B model that employs FP8 quantization for both weights and activations. The model achieves approximately 50% reduction in disk size and GPU memory requirements while maintaining comparable performance to its parent model.

Implementation Details

The model implements symmetric quantization schemes: per-channel for weights and per-token for activations. Only the linear operators within transformer blocks are quantized, preserving model accuracy while significantly improving efficiency.

  • Weight quantization reduces bits per parameter from 16 to 8
  • Achieves up to 1.4x speedup in both single-stream and multi-stream deployment
  • Compatible with vLLM backend for efficient deployment
  • Maintains 99.8% accuracy on OpenLLM V1 benchmark compared to the original model

Core Capabilities

  • Strong performance in reasoning tasks (74.29% average score)
  • Excellent coding capabilities (77.20% pass@1 on HumanEval)
  • Efficient large context handling (up to 4096 tokens)
  • Optimized for both single-stream and multi-stream inference

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient FP8 quantization that reduces resource requirements by 50% while maintaining over 99% of the original model's performance across most benchmarks. It's particularly notable for achieving better performance than the original model in some reasoning tasks.

Q: What are the recommended use cases?

The model excels in instruction following, code generation, and reasoning tasks. It's particularly well-suited for deployment scenarios where resource efficiency is crucial, showing strong performance in both single-stream and multi-stream applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.