SmolLM2-360M

Maintained By
HuggingFaceTB

SmolLM2-360M

PropertyValue
Parameter Count360M
Training Tokens4 trillion
ArchitectureTransformer decoder
LicenseApache 2.0
PaperarXiv:2502.02737

What is SmolLM2-360M?

SmolLM2-360M is part of the SmolLM2 family of compact language models, designed to provide efficient, on-device AI capabilities while maintaining strong performance. This 360M parameter model represents a significant advancement over its predecessor, trained on a diverse 4-trillion token dataset including FineWeb-Edu, DCLM, and The Stack.

Implementation Details

The model is implemented using the Transformer decoder architecture and trained using the nanotron framework on 128 H100 GPUs. It supports both CPU and GPU inference, with optimized bfloat16 precision options for efficient deployment.

  • Memory footprint: approximately 723.56 MB
  • Supports both full precision and bfloat16 inference
  • Compatible with Hugging Face Transformers library

Core Capabilities

  • Strong performance on knowledge and reasoning tasks (HellaSwag: 54.5, PIQA: 71.7)
  • Instruction following and task completion
  • Text rewriting and summarization
  • Function calling support
  • Multi-task capability with emphasis on educational content

Frequently Asked Questions

Q: What makes this model unique?

The model's key strength lies in its compact size while maintaining competitive performance. It's specifically optimized for on-device deployment while offering capabilities typically found in larger models.

Q: What are the recommended use cases?

The model is well-suited for applications requiring on-device AI capabilities, including text generation, instruction following, and educational applications. It's particularly effective for scenarios where computational resources are limited but reliable AI assistance is needed.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.