Helium-1-preview-2b
Property | Value |
---|---|
Parameter Count | 2 billion |
Model Type | Large Language Model |
Languages | English, French, German, Italian, Portuguese, Spanish |
License | CC-BY 4.0 |
Context Length | 4096 tokens |
Model URL | https://huggingface.co/kyutai/helium-1-preview-2b |
What is helium-1-preview-2b?
Helium-1-preview-2b is a lightweight multilingual language model designed specifically for edge and mobile devices. Developed by Kyutai, this 2B parameter model represents a significant achievement in creating efficient, performant language models that can run on resource-constrained environments while maintaining impressive capabilities across six European languages.
Implementation Details
The model features a 24-layer architecture with 20 attention heads and a model dimension of 2560. It was trained on a diverse dataset including Wikipedia, Stack Exchange, open-access scientific articles, and Common Crawl using JAX on 128 NVIDIA H100 GPUs. The model employs a context window of 4096 tokens and uses a theta RoPE value of 100,000.
- 24 transformer layers with 20 attention heads
- 2560 model dimension and 7040 MLP dimension
- Trained on high-quality multilingual datasets
- Optimized for edge deployment
Core Capabilities
- Strong performance across multiple languages with 60.7% average accuracy on English benchmarks
- Competitive results on MMLU, TriviaQA, and other standard benchmarks
- Efficient multilingual processing with support for 6 European languages
- Designed for edge deployment and mobile applications
Frequently Asked Questions
Q: What makes this model unique?
Helium-1-preview-2b stands out for its efficient architecture optimized for edge devices while maintaining strong multilingual capabilities. It achieves competitive performance with larger models while using only 2B parameters.
Q: What are the recommended use cases?
The model is best suited for research and development in natural language processing, particularly in resource-constrained environments. However, it should be noted that as a base model, it requires additional fine-tuning or alignment for specific downstream applications.