OpenLLaMA 13B EasyLM
Property | Value |
---|---|
License | Apache 2.0 |
Training Data | RedPajama-Data-1T |
Research Paper | Link to Paper |
What is open_llama_13b_easylm?
OpenLLaMA 13B is a permissively licensed open-source reproduction of Meta AI's LLaMA language model. Developed by Berkeley AI Research in collaboration with Stability AI, it's trained on the RedPajama dataset containing over 1.2 trillion tokens. The model maintains architectural parity with the original LLaMA while offering open accessibility under the Apache 2.0 license.
Implementation Details
The model is implemented using both PyTorch and JAX frameworks, with training conducted on cloud TPU-v4s using the EasyLM framework. It employs a combination of normal data parallelism and fully sharded data parallelism (FSDP) for optimal training efficiency, achieving over 2200 tokens/second/TPU-v4 chip throughput.
- Compatible with Hugging Face transformers library
- Trained using the same hyperparameters as original LLaMA
- Includes both EasyLM and PyTorch format weights
- Requires BOS token (id=1) for optimal few-shot evaluation
Core Capabilities
- Matches or exceeds original LLaMA performance on multiple benchmarks
- Achieves 0.57 average score across standard evaluation metrics
- Particularly strong in tasks like ARC easy (0.75 acc) and BoolQ (0.75 acc)
- Effective for both zero-shot and few-shot learning tasks
Frequently Asked Questions
Q: What makes this model unique?
OpenLLaMA 13B stands out for being a fully open-source reproduction of LLaMA with comparable performance, trained from scratch on the RedPajama dataset. Its Apache 2.0 license makes it more accessible for both research and commercial applications.
Q: What are the recommended use cases?
The model is well-suited for a variety of NLP tasks including question-answering, reasoning, and general language understanding. It performs particularly well on tasks requiring logical reasoning and reading comprehension, as evidenced by its strong performance on ARC, BoolQ, and PIQA benchmarks.