SmolLM2-360M
Property | Value |
---|---|
Parameter Count | 360M |
Training Tokens | 4 trillion |
Architecture | Transformer decoder |
License | Apache 2.0 |
Paper | arXiv:2502.02737 |
What is SmolLM2-360M?
SmolLM2-360M is part of the SmolLM2 family of compact language models, designed to provide efficient, on-device AI capabilities while maintaining strong performance. This 360M parameter model represents a significant advancement over its predecessor, trained on a diverse 4-trillion token dataset including FineWeb-Edu, DCLM, and The Stack.
Implementation Details
The model is implemented using the Transformer decoder architecture and trained using the nanotron framework on 128 H100 GPUs. It supports both CPU and GPU inference, with optimized bfloat16 precision options for efficient deployment.
- Memory footprint: approximately 723.56 MB
- Supports both full precision and bfloat16 inference
- Compatible with Hugging Face Transformers library
Core Capabilities
- Strong performance on knowledge and reasoning tasks (HellaSwag: 54.5, PIQA: 71.7)
- Instruction following and task completion
- Text rewriting and summarization
- Function calling support
- Multi-task capability with emphasis on educational content
Frequently Asked Questions
Q: What makes this model unique?
The model's key strength lies in its compact size while maintaining competitive performance. It's specifically optimized for on-device deployment while offering capabilities typically found in larger models.
Q: What are the recommended use cases?
The model is well-suited for applications requiring on-device AI capabilities, including text generation, instruction following, and educational applications. It's particularly effective for scenarios where computational resources are limited but reliable AI assistance is needed.