PLaMo 2 1B
Property | Value |
---|---|
Parameter Count | 1 Billion |
Training Tokens | 4 Trillion |
Languages | English, Japanese |
License | Apache License 2.0 |
Developer | Preferred Elements, Inc. |
What is plamo-2-1b?
PLaMo 2 1B is an innovative bilingual language model that represents a significant advancement in hybrid architecture design. Developed by Preferred Elements, Inc., it combines the selective State Space Model (SSM) of Mamba with sliding window attention, creating a more efficient and powerful language processing system.
Implementation Details
The model employs a sophisticated training approach across two phases: 3.5T tokens in phase 1 and 0.5T tokens in phase 2. It features enhanced normalization layers for improved training stability and utilizes the Mamba2 kernel for computational efficiency. The tokenizer is optimized using numba, a JIT compiler for numerical functions.
- Hybrid architecture combining Mamba SSM and sliding window attention
- Specialized training distribution: 45% English, 30% Japanese, 15% Coding, 10% Other content
- Enhanced normalization layers for stability
- Optimized tokenizer with numba implementation
Core Capabilities
- Bilingual processing in English and Japanese
- Efficient text generation and completion
- Code processing capabilities
- Flexible deployment options through Hugging Face Transformers
Frequently Asked Questions
Q: What makes this model unique?
PLaMo 2 1B stands out for its hybrid architecture that combines Mamba SSM with sliding window attention, offering improved efficiency while maintaining high performance. Its bilingual capabilities and specialized training across multiple content types make it versatile for various applications.
Q: What are the recommended use cases?
The model is primarily designed for text generation tasks in both English and Japanese. However, it's important to note that it has NOT been instruction-tuned for chat dialog or other downstream tasks. Users should perform safety testing and tuning for their specific applications.