OLMoE-1B-7B-0924-Instruct
Property | Value |
---|---|
Parameter Count | 6.92B (1B active, 7B total) |
Model Type | Mixture-of-Experts Language Model |
License | Apache 2.0 |
Paper | arXiv:2409.02060 |
Tensor Type | BF16 |
What is OLMoE-1B-7B-0924-Instruct?
OLMoE-1B-7B-0924-Instruct is an advanced Mixture-of-Experts (MoE) language model that achieves state-of-the-art performance while maintaining efficient computational costs. Released in September 2024, it uses an innovative approach where only 1B parameters are active during inference while having access to a total parameter count of 7B. The model has been fine-tuned using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) techniques.
Implementation Details
The model represents a significant advancement in efficient language model design, implementing the following architecture and training methodology:
- Mixture-of-Experts architecture allowing for selective parameter activation
- Trained using a combination of SFT and DPO/KTO optimization techniques
- Fully open-source implementation with comprehensive documentation
- Supports standard transformers library interface with chat template functionality
Core Capabilities
- Competitive performance with much larger models like Llama2-13B-Chat
- Strong performance across multiple benchmarks including MMLU (51.9%), GSM8k (45.5%), and BBH (37.0%)
- Excellent performance in human evaluation metrics (84.0% in Alpaca-Eval 1.0)
- Efficient inference with only 1B active parameters
Frequently Asked Questions
Q: What makes this model unique?
The model's unique Mixture-of-Experts architecture allows it to achieve performance comparable to much larger models while only using a fraction of the computational resources during inference. It maintains 1B active parameters while having access to 7B total parameters, making it both efficient and powerful.
Q: What are the recommended use cases?
The model is well-suited for a wide range of natural language processing tasks, including general text generation, question-answering, and conversational AI applications. It's particularly effective when computational efficiency is a priority while maintaining high-quality outputs.