MOSS-moon-003-base
Property | Value |
---|---|
Parameters | 16B |
License | AGPL-3.0 |
Paper | CodeGen Paper |
Developer | Fudan University (FNLP) |
What is moss-moon-003-base?
MOSS-moon-003-base is a powerful multilingual language model developed by Fudan University, built upon the CodeGen architecture. This 16B parameter model was pre-trained on an extensive dataset of 700B tokens, including 100B Chinese tokens and 20B English tokens, making it highly capable in both languages. The model serves as the foundation for the MOSS family of models, which can be enhanced with various plugins for expanded functionality.
Implementation Details
The model was initialized using CodeGen and underwent extensive pre-training that consumed approximately 6.67x10^22 FLOPs. It's designed to run on high-end hardware, requiring either a single A100 GPU or multiple NVIDIA 3090 GPUs for inference. The architecture supports both full precision and quantized versions (INT4/INT8) for more efficient deployment.
- Pre-trained on diverse datasets including PILE, BigQuery, BigPython, and private Chinese corpus
- Supports both FP16 and quantized inference options
- Implements sophisticated plugin architecture for extended capabilities
- Uses advanced tokenization supporting both English and Chinese
Core Capabilities
- Multilingual understanding and generation in English and Chinese
- Plugin support for web search, calculations, and equation solving
- Code generation and comprehension
- Multi-turn dialogue handling
- Safety-aware responses with built-in harm prevention
Frequently Asked Questions
Q: What makes this model unique?
MOSS-003-base stands out for its balanced bilingual capabilities and plugin architecture, allowing it to be extended with various tools while maintaining strong performance in both English and Chinese. Its initialization from CodeGen and extensive pre-training make it particularly capable in code-related tasks.
Q: What are the recommended use cases?
The model is well-suited for multilingual applications, code generation, general dialogue systems, and plugin-augmented interactions. It's particularly valuable for applications requiring both English and Chinese language processing or code-related tasks.