llama3-42b-v0

Property	Value
Parameter Count	43.2B
License	llama3
Architecture	Llama 3 (Pruned)
Training Data	JeanKaddour/minipile
Paper	The Unreasonable Ineffectiveness of the Deeper Layers

What is llama3-42b-v0?

llama3-42b-v0 is a pruned version of Meta's Llama 3 70B model, reduced to 42B parameters using advanced pruning methodology. This base model has been post-pruning trained using QLoRA for approximately 100M tokens from the minipile dataset. It represents a significant achievement in model compression while maintaining strong performance metrics.

Implementation Details

The model utilizes the pruning methodology described in "The Unreasonable Ineffectiveness of the Deeper Layers" paper, with layers selected using PruneMe technology. It operates in BF16 tensor format and is built using the Axolotl framework.

Pruned from 70B to 42B parameters while maintaining performance
Trained using QLoRA on minipile dataset
Implements advanced layer selection via PruneMe
Supports PyTorch and text-generation-inference

Core Capabilities

Strong MMLU performance (76.69% accuracy)
Exceptional social sciences scoring (86.68% accuracy)
Robust performance on Winogrande (80.27% accuracy)
High hellaswag performance (80.25% accuracy norm)

Frequently Asked Questions

Q: What makes this model unique?

This model demonstrates that significant parameter reduction through pruning can maintain strong performance metrics. It's particularly notable for being a base model without instruction tuning, making it suitable for custom fine-tuning projects.

Q: What are the recommended use cases?

As a base model, it's best suited for custom fine-tuning projects rather than direct deployment. The model should not be used with Llama 3 instruction format as it has randomly initialized special tokens and no instruction tuning.

llama3-42b-v0

llama3-42b-v0

What is llama3-42b-v0?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models