SmolLM2-360M

Property	Value
Parameter Count	360M
Training Tokens	4 trillion
Architecture	Transformer decoder
License	Apache 2.0
Paper	arXiv:2502.02737

What is SmolLM2-360M?

SmolLM2-360M is part of the SmolLM2 family of compact language models, designed to provide efficient, on-device AI capabilities while maintaining strong performance. This 360M parameter model represents a significant advancement over its predecessor, trained on a diverse 4-trillion token dataset including FineWeb-Edu, DCLM, and The Stack.

Implementation Details

The model is implemented using the Transformer decoder architecture and trained using the nanotron framework on 128 H100 GPUs. It supports both CPU and GPU inference, with optimized bfloat16 precision options for efficient deployment.

Memory footprint: approximately 723.56 MB
Supports both full precision and bfloat16 inference
Compatible with Hugging Face Transformers library

Core Capabilities

Strong performance on knowledge and reasoning tasks (HellaSwag: 54.5, PIQA: 71.7)
Instruction following and task completion
Text rewriting and summarization
Function calling support
Multi-task capability with emphasis on educational content

Frequently Asked Questions

Q: What makes this model unique?

The model's key strength lies in its compact size while maintaining competitive performance. It's specifically optimized for on-device deployment while offering capabilities typically found in larger models.

Q: What are the recommended use cases?

The model is well-suited for applications requiring on-device AI capabilities, including text generation, instruction following, and educational applications. It's particularly effective for scenarios where computational resources are limited but reliable AI assistance is needed.

SmolLM2-360M

SmolLM2-360M

What is SmolLM2-360M?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models