Sparse-Llama-3.1-8B-2of4

Sparse-Llama-3.1-8B-2of4

neuralmagic

Optimized 8B parameter LLM with 2:4 sparsity pattern, achieving 98.37% accuracy recovery compared to dense model. Efficient for deployment via vLLM.

PropertyValue
Parameter Count8.03B
Model TypeText Generation
ArchitectureLlama-3.1-8B with 2:4 Sparsity
Licensellama3.1
DeveloperNeural Magic
Research PaperSparseGPT, SquareHead

What is Sparse-Llama-3.1-8B-2of4?

Sparse-Llama-3.1-8B-2of4 is an optimized version of the Llama-3.1-8B model that implements an innovative 2:4 sparsity pattern, where two out of every four weights are strategically pruned while maintaining near-original performance. This model demonstrates impressive efficiency with 98.37% accuracy recovery on the OpenLLM benchmark and 97.3% on the Mosaic Eval Gauntlet.

Implementation Details

The model employs advanced optimization techniques combining SparseGPT and SquareHead approaches. It underwent pruning of all linear operators within transformer blocks, followed by knowledge distillation training for 13B tokens to recover accuracy. The implementation is specifically designed for efficient deployment using the vLLM backend.

  • Utilizes 2:4 sparsity pattern in transformer blocks
  • Trained with knowledge distillation for accuracy recovery
  • Optimized for vLLM deployment
  • BF16 tensor type for efficient computation

Core Capabilities

  • Maintains 62.16 average score on OpenLLM benchmark
  • Strong performance in GSM8K (56.3) and ARC-C (59.4)
  • Effective language understanding with 69.0 score on relevant tasks
  • Efficient deployment through vLLM with OpenAI-compatible serving

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to maintain near-original performance while implementing a 2:4 sparsity pattern, effectively reducing computational requirements without significant accuracy loss.

Q: What are the recommended use cases?

This model is particularly suitable for deployment scenarios where efficiency is crucial while maintaining high accuracy. It's ideal for text generation tasks, especially in environments that can leverage vLLM backend optimization.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026