Llama-3.1-Tulu-3-70B-broken
Property | Value |
---|---|
Parameter Count | 69.5B |
License | Llama 3.1 Community License |
Language | English |
Base Model | Llama-3.1-Tulu-3-70B-DPO |
What is Llama-3.1-Tulu-3-70B-broken?
This is a unique research-focused release of the Tulu-3 family, specifically a 70B parameter model that's missing its Language Model (LM) head due to a checkpoint saving bug. It presents an interesting challenge for researchers to attempt reconstructing the LM head, which could potentially result in a state-of-the-art model.
Implementation Details
The model is built on the Llama 3.1 architecture and is part of the comprehensive Tulu-3 model family. It's designed for instruction following and maintains the full transformer architecture minus the LM head. The model was trained on a diverse mix of publicly available, synthetic, and human-created datasets.
- Built on Llama 3.1 70B base model
- Uses F32 tensor type
- Part of a larger training pipeline including SFT, DPO, and RLVR stages
Core Capabilities
- Foundation for research in LM head reconstruction
- Potential for SOTA performance on diverse tasks
- Specialized for MATH, GSM8K, and IFEval tasks
- Designed for instruction following
Frequently Asked Questions
Q: What makes this model unique?
This model is unique because it's intentionally released in an incomplete state (missing LM head) to encourage research into weight reconstruction methods. Successfully recovering the weights could result in a powerful SOTA model.
Q: What are the recommended use cases?
The model is primarily intended for research purposes, specifically for those interested in LM head reconstruction techniques. It's not suitable for direct deployment in production environments due to its incomplete state.