OpenELM-3B

OpenELM-3B

apple

OpenELM-3B is a 3.04B parameter efficient language model from Apple, trained on 1.8T tokens with layer-wise scaling for enhanced accuracy

PropertyValue
Parameter Count3.04B
LicenseApple Sample Code License
PaperarXiv:2404.14619
Training Data1.8T tokens
Model TypeTransformer-based Language Model

What is OpenELM-3B?

OpenELM-3B is part of Apple's Open Efficient Language Model family, representing their largest publicly released model with 3.04 billion parameters. It utilizes an innovative layer-wise scaling strategy to optimize parameter allocation within transformer layers, resulting in enhanced performance across various NLP tasks.

Implementation Details

The model was trained on a diverse dataset comprising RefinedWeb, deduplicated PILE, RedPajama subset, and Dolma v1.6, totaling approximately 1.8 trillion tokens. It employs the CoreNet library for pre-training and supports various generation strategies including lookup token speculative generation for improved inference speed.

  • Advanced layer-wise parameter scaling architecture
  • Compatible with Hugging Face's transformers library
  • Supports both vanilla and instruction-tuned variants
  • Implements efficient inference optimization techniques

Core Capabilities

  • Strong performance on zero-shot tasks (67.39% average across standard benchmarks)
  • Excellent results on complex reasoning tasks (ARC-c: 35.58%)
  • High accuracy on common sense tasks (HellaSwag: 72.44%)
  • Superior performance on scientific knowledge (SciQ: 92.70%)

Frequently Asked Questions

Q: What makes this model unique?

OpenELM-3B stands out for its efficient parameter allocation strategy and comprehensive open-source framework that includes data preparation, training, fine-tuning, and evaluation procedures. It achieves strong performance while maintaining computational efficiency.

Q: What are the recommended use cases?

The model excels in text generation, reasoning tasks, and scientific question-answering. It's particularly well-suited for applications requiring strong zero-shot performance and can be used with speculative generation for faster inference.

Socials
Integrations
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026