OLMoE-1B-7B-0924-Instruct

Maintained By
allenai

OLMoE-1B-7B-0924-Instruct

PropertyValue
Parameter Count6.92B (1B active, 7B total)
Model TypeMixture-of-Experts Language Model
LicenseApache 2.0
PaperarXiv:2409.02060
Tensor TypeBF16

What is OLMoE-1B-7B-0924-Instruct?

OLMoE-1B-7B-0924-Instruct is an advanced Mixture-of-Experts (MoE) language model that achieves state-of-the-art performance while maintaining efficient computational costs. Released in September 2024, it uses an innovative approach where only 1B parameters are active during inference while having access to a total parameter count of 7B. The model has been fine-tuned using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) techniques.

Implementation Details

The model represents a significant advancement in efficient language model design, implementing the following architecture and training methodology:

  • Mixture-of-Experts architecture allowing for selective parameter activation
  • Trained using a combination of SFT and DPO/KTO optimization techniques
  • Fully open-source implementation with comprehensive documentation
  • Supports standard transformers library interface with chat template functionality

Core Capabilities

  • Competitive performance with much larger models like Llama2-13B-Chat
  • Strong performance across multiple benchmarks including MMLU (51.9%), GSM8k (45.5%), and BBH (37.0%)
  • Excellent performance in human evaluation metrics (84.0% in Alpaca-Eval 1.0)
  • Efficient inference with only 1B active parameters

Frequently Asked Questions

Q: What makes this model unique?

The model's unique Mixture-of-Experts architecture allows it to achieve performance comparable to much larger models while only using a fraction of the computational resources during inference. It maintains 1B active parameters while having access to 7B total parameters, making it both efficient and powerful.

Q: What are the recommended use cases?

The model is well-suited for a wide range of natural language processing tasks, including general text generation, question-answering, and conversational AI applications. It's particularly effective when computational efficiency is a priority while maintaining high-quality outputs.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.