llama3-42b-v0
Property | Value |
---|---|
Parameter Count | 43.2B |
License | llama3 |
Architecture | Llama 3 (Pruned) |
Training Data | JeanKaddour/minipile |
Paper | The Unreasonable Ineffectiveness of the Deeper Layers |
What is llama3-42b-v0?
llama3-42b-v0 is a pruned version of Meta's Llama 3 70B model, reduced to 42B parameters using advanced pruning methodology. This base model has been post-pruning trained using QLoRA for approximately 100M tokens from the minipile dataset. It represents a significant achievement in model compression while maintaining strong performance metrics.
Implementation Details
The model utilizes the pruning methodology described in "The Unreasonable Ineffectiveness of the Deeper Layers" paper, with layers selected using PruneMe technology. It operates in BF16 tensor format and is built using the Axolotl framework.
- Pruned from 70B to 42B parameters while maintaining performance
- Trained using QLoRA on minipile dataset
- Implements advanced layer selection via PruneMe
- Supports PyTorch and text-generation-inference
Core Capabilities
- Strong MMLU performance (76.69% accuracy)
- Exceptional social sciences scoring (86.68% accuracy)
- Robust performance on Winogrande (80.27% accuracy)
- High hellaswag performance (80.25% accuracy norm)
Frequently Asked Questions
Q: What makes this model unique?
This model demonstrates that significant parameter reduction through pruning can maintain strong performance metrics. It's particularly notable for being a base model without instruction tuning, making it suitable for custom fine-tuning projects.
Q: What are the recommended use cases?
As a base model, it's best suited for custom fine-tuning projects rather than direct deployment. The model should not be used with Llama 3 instruction format as it has randomly initialized special tokens and no instruction tuning.