Llama-3.1-Tulu-3-70B-broken

Property	Value
Parameter Count	69.5B
License	Llama 3.1 Community License
Language	English
Base Model	Llama-3.1-Tulu-3-70B-DPO

What is Llama-3.1-Tulu-3-70B-broken?

This is a unique research-focused release of the Tulu-3 family, specifically a 70B parameter model that's missing its Language Model (LM) head due to a checkpoint saving bug. It presents an interesting challenge for researchers to attempt reconstructing the LM head, which could potentially result in a state-of-the-art model.

Implementation Details

The model is built on the Llama 3.1 architecture and is part of the comprehensive Tulu-3 model family. It's designed for instruction following and maintains the full transformer architecture minus the LM head. The model was trained on a diverse mix of publicly available, synthetic, and human-created datasets.

Built on Llama 3.1 70B base model
Uses F32 tensor type
Part of a larger training pipeline including SFT, DPO, and RLVR stages

Core Capabilities

Foundation for research in LM head reconstruction
Potential for SOTA performance on diverse tasks
Specialized for MATH, GSM8K, and IFEval tasks
Designed for instruction following

Frequently Asked Questions

Q: What makes this model unique?

This model is unique because it's intentionally released in an incomplete state (missing LM head) to encourage research into weight reconstruction methods. Successfully recovering the weights could result in a powerful SOTA model.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, specifically for those interested in LM head reconstruction techniques. It's not suitable for direct deployment in production environments due to its incomplete state.