moss-moon-003-sft

Maintained By
fnlp

MOSS-moon-003-sft

PropertyValue
Model Size16B parameters
LicenseAGPL-3.0
Base ModelCodeGen-based
Training Data700B tokens (100B Chinese, 20B English)
PaperResearch Paper

What is moss-moon-003-sft?

MOSS is an advanced multilingual language model developed by Fudan University, specifically designed for conversational AI applications. This supervised fine-tuned (SFT) version builds upon the base model with 1.1M multi-turn conversations, creating a more focused and controlled dialogue system. The model maintains fluency in both English and Chinese while incorporating strong safety measures and ethical guidelines.

Implementation Details

Built on a 16B parameter architecture, MOSS utilizes advanced transformer technology with several key technical innovations. The model requires 31GB GPU memory for FP16 inference, though quantized versions (INT4/INT8) are available for more efficient deployment. It supports both single-GPU and multi-GPU configurations, making it adaptable to various computational resources.

  • Pre-trained on 700B tokens across multiple languages
  • Supervised fine-tuning on 1.1M conversations
  • Supports plugin architecture for external tool integration
  • Available in multiple quantized versions for efficiency

Core Capabilities

  • Multilingual conversation in English and Chinese
  • Plugin integration (search, calculator, text-to-image)
  • Code generation and understanding
  • Mathematical problem solving
  • Strict safety controls and ethical guidelines

Frequently Asked Questions

Q: What makes this model unique?

MOSS stands out for its balanced approach to multilingual capabilities, strong safety controls, and plugin architecture. Unlike many models that excel in either English or Chinese, MOSS maintains high performance in both languages while incorporating robust ethical guidelines and refusal capabilities for inappropriate requests.

Q: What are the recommended use cases?

The model is well-suited for conversational AI applications, customer service, content generation, and educational assistance. Its plugin architecture makes it particularly valuable for tasks requiring external tool integration, such as web searches or calculations. The model can be deployed in both academic and commercial settings, though commercial use requires specific authorization.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.