OpenLLaMA 13B
Property | Value |
---|---|
License | Apache 2.0 |
Training Data | RedPajama-Data-1T |
Framework | PyTorch, JAX |
Author | openlm-research |
What is open_llama_13b?
OpenLLaMA 13B is a permissively licensed open-source reproduction of Meta AI's LLaMA language model. It's trained on 1 trillion tokens from the RedPajama dataset and achieves performance comparable to the original LLaMA model across various benchmarks.
Implementation Details
The model follows the exact same architecture and training hyperparameters as the original LLaMA, trained using cloud TPU-v4s with EasyLM framework. It implements both normal data parallelism and fully sharded data parallelism (ZeRO stage 3) for optimal training efficiency.
- Trained on RedPajama dataset (1.2 trillion tokens)
- Identical architecture to original LLaMA
- Supports both PyTorch and JAX frameworks
- Available through Hugging Face Transformers library
Core Capabilities
- Strong performance on tasks like ARC, PIQA, and ANLI
- Matches or exceeds original LLaMA performance on several benchmarks
- Achieves 91% accuracy on RECORD evaluation
- Effective at both few-shot and zero-shot tasks
Frequently Asked Questions
Q: What makes this model unique?
OpenLLaMA 13B stands out for being a fully open-source, Apache 2.0 licensed alternative to the original LLaMA model, trained completely from scratch including the tokenizer. It achieves comparable performance while being freely available for commercial use.
Q: What are the recommended use cases?
The model is suitable for various natural language processing tasks including question-answering, text completion, and reasoning tasks. It's particularly effective for applications requiring strong performance on academic benchmarks and general language understanding.