OpenLLaMA 3B v2
Property | Value |
---|---|
License | Apache 2.0 |
Training Data | Falcon refined-web, StarCoder, RedPajama |
Model Size | 3 Billion parameters |
Framework | PyTorch/JAX |
What is open_llama_3b_v2?
OpenLLaMA 3B v2 is a permissively licensed open-source reproduction of Meta AI's LLaMA language model. It represents a significant advancement in accessible AI, trained on 1 trillion tokens and designed to serve as a drop-in replacement for the original LLaMA implementation.
Implementation Details
The model is trained using cloud TPU-v4s with EasyLM, achieving over 2200 tokens/second/TPU-v4 chip throughput. It implements a combination of normal data parallelism and fully sharded data parallelism (FSDP/ZeRO stage 3) for optimal performance.
- Trained on multiple high-quality datasets including Falcon refined-web, StarCoder, and RedPajama
- Follows identical preprocessing steps and hyperparameters as the original LLaMA
- Available in both PyTorch and JAX formats
Core Capabilities
- General text generation and completion tasks
- Competitive performance with original LLaMA on multiple benchmarks
- Seamless integration with Hugging Face transformers library
- Support for context-aware text generation
Frequently Asked Questions
Q: What makes this model unique?
This model is unique in being an open-source, permissively licensed alternative to LLaMA, trained from scratch on publicly available datasets. It achieves comparable performance to the original while being freely available for commercial use.
Q: What are the recommended use cases?
The model is well-suited for various NLP tasks including text generation, completion, and analysis. It's particularly valuable for researchers and developers who need a powerful language model with permissive licensing for commercial applications.