Sparse-Llama-3.1-8B-2of4

Maintained By
neuralmagic

Sparse-Llama-3.1-8B-2of4

PropertyValue
Parameter Count8.03B
Model TypeText Generation
ArchitectureLlama-3.1-8B with 2:4 Sparsity
Licensellama3.1
DeveloperNeural Magic
Research PaperSparseGPT, SquareHead

What is Sparse-Llama-3.1-8B-2of4?

Sparse-Llama-3.1-8B-2of4 is an optimized version of the Llama-3.1-8B model that implements an innovative 2:4 sparsity pattern, where two out of every four weights are strategically pruned while maintaining near-original performance. This model demonstrates impressive efficiency with 98.37% accuracy recovery on the OpenLLM benchmark and 97.3% on the Mosaic Eval Gauntlet.

Implementation Details

The model employs advanced optimization techniques combining SparseGPT and SquareHead approaches. It underwent pruning of all linear operators within transformer blocks, followed by knowledge distillation training for 13B tokens to recover accuracy. The implementation is specifically designed for efficient deployment using the vLLM backend.

  • Utilizes 2:4 sparsity pattern in transformer blocks
  • Trained with knowledge distillation for accuracy recovery
  • Optimized for vLLM deployment
  • BF16 tensor type for efficient computation

Core Capabilities

  • Maintains 62.16 average score on OpenLLM benchmark
  • Strong performance in GSM8K (56.3) and ARC-C (59.4)
  • Effective language understanding with 69.0 score on relevant tasks
  • Efficient deployment through vLLM with OpenAI-compatible serving

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to maintain near-original performance while implementing a 2:4 sparsity pattern, effectively reducing computational requirements without significant accuracy loss.

Q: What are the recommended use cases?

This model is particularly suitable for deployment scenarios where efficiency is crucial while maintaining high accuracy. It's ideal for text generation tasks, especially in environments that can leverage vLLM backend optimization.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.