llama3-42b-v0

Maintained By
chargoddard

llama3-42b-v0

PropertyValue
Parameter Count43.2B
Licensellama3
ArchitectureLlama 3 (Pruned)
Training DataJeanKaddour/minipile
PaperThe Unreasonable Ineffectiveness of the Deeper Layers

What is llama3-42b-v0?

llama3-42b-v0 is a pruned version of Meta's Llama 3 70B model, reduced to 42B parameters using advanced pruning methodology. This base model has been post-pruning trained using QLoRA for approximately 100M tokens from the minipile dataset. It represents a significant achievement in model compression while maintaining strong performance metrics.

Implementation Details

The model utilizes the pruning methodology described in "The Unreasonable Ineffectiveness of the Deeper Layers" paper, with layers selected using PruneMe technology. It operates in BF16 tensor format and is built using the Axolotl framework.

  • Pruned from 70B to 42B parameters while maintaining performance
  • Trained using QLoRA on minipile dataset
  • Implements advanced layer selection via PruneMe
  • Supports PyTorch and text-generation-inference

Core Capabilities

  • Strong MMLU performance (76.69% accuracy)
  • Exceptional social sciences scoring (86.68% accuracy)
  • Robust performance on Winogrande (80.27% accuracy)
  • High hellaswag performance (80.25% accuracy norm)

Frequently Asked Questions

Q: What makes this model unique?

This model demonstrates that significant parameter reduction through pruning can maintain strong performance metrics. It's particularly notable for being a base model without instruction tuning, making it suitable for custom fine-tuning projects.

Q: What are the recommended use cases?

As a base model, it's best suited for custom fine-tuning projects rather than direct deployment. The model should not be used with Llama 3 instruction format as it has randomly initialized special tokens and no instruction tuning.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.