OpenELM-3B

apple

OpenELM-3B is a 3.04B parameter efficient language model from Apple, trained on 1.8T tokens with layer-wise scaling for enhanced accuracy

Property	Value
Parameter Count	3.04B
License	Apple Sample Code License
Paper	arXiv:2404.14619
Training Data	1.8T tokens
Model Type	Transformer-based Language Model

What is OpenELM-3B?

OpenELM-3B is part of Apple's Open Efficient Language Model family, representing their largest publicly released model with 3.04 billion parameters. It utilizes an innovative layer-wise scaling strategy to optimize parameter allocation within transformer layers, resulting in enhanced performance across various NLP tasks.

Implementation Details

The model was trained on a diverse dataset comprising RefinedWeb, deduplicated PILE, RedPajama subset, and Dolma v1.6, totaling approximately 1.8 trillion tokens. It employs the CoreNet library for pre-training and supports various generation strategies including lookup token speculative generation for improved inference speed.

Advanced layer-wise parameter scaling architecture
Compatible with Hugging Face's transformers library
Supports both vanilla and instruction-tuned variants
Implements efficient inference optimization techniques

Core Capabilities

Strong performance on zero-shot tasks (67.39% average across standard benchmarks)
Excellent results on complex reasoning tasks (ARC-c: 35.58%)
High accuracy on common sense tasks (HellaSwag: 72.44%)
Superior performance on scientific knowledge (SciQ: 92.70%)

Frequently Asked Questions

Q: What makes this model unique?

OpenELM-3B stands out for its efficient parameter allocation strategy and comprehensive open-source framework that includes data preparation, training, fine-tuning, and evaluation procedures. It achieves strong performance while maintaining computational efficiency.

Q: What are the recommended use cases?

The model excels in text generation, reasoning tasks, and scientific question-answering. It's particularly well-suited for applications requiring strong zero-shot performance and can be used with speculative generation for faster inference.