OpenLLaMA 13B EasyLM

Property	Value
License	Apache 2.0
Training Data	RedPajama-Data-1T
Research Paper	Link to Paper

What is open_llama_13b_easylm?

OpenLLaMA 13B is a permissively licensed open-source reproduction of Meta AI's LLaMA language model. Developed by Berkeley AI Research in collaboration with Stability AI, it's trained on the RedPajama dataset containing over 1.2 trillion tokens. The model maintains architectural parity with the original LLaMA while offering open accessibility under the Apache 2.0 license.

Implementation Details

The model is implemented using both PyTorch and JAX frameworks, with training conducted on cloud TPU-v4s using the EasyLM framework. It employs a combination of normal data parallelism and fully sharded data parallelism (FSDP) for optimal training efficiency, achieving over 2200 tokens/second/TPU-v4 chip throughput.

Compatible with Hugging Face transformers library
Trained using the same hyperparameters as original LLaMA
Includes both EasyLM and PyTorch format weights
Requires BOS token (id=1) for optimal few-shot evaluation

Core Capabilities

Matches or exceeds original LLaMA performance on multiple benchmarks
Achieves 0.57 average score across standard evaluation metrics
Particularly strong in tasks like ARC easy (0.75 acc) and BoolQ (0.75 acc)
Effective for both zero-shot and few-shot learning tasks

Frequently Asked Questions

Q: What makes this model unique?

OpenLLaMA 13B stands out for being a fully open-source reproduction of LLaMA with comparable performance, trained from scratch on the RedPajama dataset. Its Apache 2.0 license makes it more accessible for both research and commercial applications.

Q: What are the recommended use cases?

The model is well-suited for a variety of NLP tasks including question-answering, reasoning, and general language understanding. It performs particularly well on tasks requiring logical reasoning and reading comprehension, as evidenced by its strong performance on ARC, BoolQ, and PIQA benchmarks.