MOSS-moon-003-sft

Property	Value
Model Size	16B parameters
License	AGPL-3.0
Base Model	CodeGen-based
Training Data	700B tokens (100B Chinese, 20B English)
Paper	Research Paper

What is moss-moon-003-sft?

MOSS is an advanced multilingual language model developed by Fudan University, specifically designed for conversational AI applications. This supervised fine-tuned (SFT) version builds upon the base model with 1.1M multi-turn conversations, creating a more focused and controlled dialogue system. The model maintains fluency in both English and Chinese while incorporating strong safety measures and ethical guidelines.

Implementation Details

Built on a 16B parameter architecture, MOSS utilizes advanced transformer technology with several key technical innovations. The model requires 31GB GPU memory for FP16 inference, though quantized versions (INT4/INT8) are available for more efficient deployment. It supports both single-GPU and multi-GPU configurations, making it adaptable to various computational resources.

Pre-trained on 700B tokens across multiple languages
Supervised fine-tuning on 1.1M conversations
Supports plugin architecture for external tool integration
Available in multiple quantized versions for efficiency

Core Capabilities

Multilingual conversation in English and Chinese
Plugin integration (search, calculator, text-to-image)
Code generation and understanding
Mathematical problem solving
Strict safety controls and ethical guidelines

Frequently Asked Questions

Q: What makes this model unique?

MOSS stands out for its balanced approach to multilingual capabilities, strong safety controls, and plugin architecture. Unlike many models that excel in either English or Chinese, MOSS maintains high performance in both languages while incorporating robust ethical guidelines and refusal capabilities for inappropriate requests.

Q: What are the recommended use cases?

The model is well-suited for conversational AI applications, customer service, content generation, and educational assistance. Its plugin architecture makes it particularly valuable for tasks requiring external tool integration, such as web searches or calculations. The model can be deployed in both academic and commercial settings, though commercial use requires specific authorization.

moss-moon-003-sft