Helium-1-preview-2b

Property	Value
Parameter Count	2 billion
Model Type	Large Language Model
Languages	English, French, German, Italian, Portuguese, Spanish
License	CC-BY 4.0
Context Length	4096 tokens
Model URL	https://huggingface.co/kyutai/helium-1-preview-2b

What is helium-1-preview-2b?

Helium-1-preview-2b is a lightweight multilingual language model designed specifically for edge and mobile devices. Developed by Kyutai, this 2B parameter model represents a significant achievement in creating efficient, performant language models that can run on resource-constrained environments while maintaining impressive capabilities across six European languages.

Implementation Details

The model features a 24-layer architecture with 20 attention heads and a model dimension of 2560. It was trained on a diverse dataset including Wikipedia, Stack Exchange, open-access scientific articles, and Common Crawl using JAX on 128 NVIDIA H100 GPUs. The model employs a context window of 4096 tokens and uses a theta RoPE value of 100,000.

24 transformer layers with 20 attention heads
2560 model dimension and 7040 MLP dimension
Trained on high-quality multilingual datasets
Optimized for edge deployment

Core Capabilities

Strong performance across multiple languages with 60.7% average accuracy on English benchmarks
Competitive results on MMLU, TriviaQA, and other standard benchmarks
Efficient multilingual processing with support for 6 European languages
Designed for edge deployment and mobile applications

Frequently Asked Questions

Q: What makes this model unique?

Helium-1-preview-2b stands out for its efficient architecture optimized for edge devices while maintaining strong multilingual capabilities. It achieves competitive performance with larger models while using only 2B parameters.

Q: What are the recommended use cases?

The model is best suited for research and development in natural language processing, particularly in resource-constrained environments. However, it should be noted that as a base model, it requires additional fine-tuning or alignment for specific downstream applications.