PLaMo 2 1B

Property	Value
Parameter Count	1 Billion
Training Tokens	4 Trillion
Languages	English, Japanese
License	Apache License 2.0
Developer	Preferred Elements, Inc.

What is plamo-2-1b?

PLaMo 2 1B is an innovative bilingual language model that represents a significant advancement in hybrid architecture design. Developed by Preferred Elements, Inc., it combines the selective State Space Model (SSM) of Mamba with sliding window attention, creating a more efficient and powerful language processing system.

Implementation Details

The model employs a sophisticated training approach across two phases: 3.5T tokens in phase 1 and 0.5T tokens in phase 2. It features enhanced normalization layers for improved training stability and utilizes the Mamba2 kernel for computational efficiency. The tokenizer is optimized using numba, a JIT compiler for numerical functions.

Hybrid architecture combining Mamba SSM and sliding window attention
Specialized training distribution: 45% English, 30% Japanese, 15% Coding, 10% Other content
Enhanced normalization layers for stability
Optimized tokenizer with numba implementation

Core Capabilities

Bilingual processing in English and Japanese
Efficient text generation and completion
Code processing capabilities
Flexible deployment options through Hugging Face Transformers

Frequently Asked Questions

Q: What makes this model unique?

PLaMo 2 1B stands out for its hybrid architecture that combines Mamba SSM with sliding window attention, offering improved efficiency while maintaining high performance. Its bilingual capabilities and specialized training across multiple content types make it versatile for various applications.

Q: What are the recommended use cases?

The model is primarily designed for text generation tasks in both English and Japanese. However, it's important to note that it has NOT been instruction-tuned for chat dialog or other downstream tasks. Users should perform safety testing and tuning for their specific applications.

plamo-2-1b