Doge-160M-Instruct

Maintained By
SmallDoge

Doge-160M-Instruct

PropertyValue
Parameter Count160M
Model TypeInstruction-tuned Language Model
ArchitectureDynamic Mask Attention with Cross Domain MoE
PaperWonderful Matrices (2024)
Training DataSmolTalk (SFT), UltraFeedback Binarized (DPO)

What is Doge-160M-Instruct?

Doge-160M-Instruct is an innovative language model that combines Dynamic Mask Attention and Cross Domain Mixture of Experts to achieve efficient performance. It represents a significant advancement in compact language models, achieving impressive results across various benchmarks while maintaining computational efficiency.

Implementation Details

The model was developed through a two-stage training process: Supervised Fine-Tuning (SFT) on SmolTalk dataset followed by Direct Preference Optimization (DPO) on UltraFeedback Binarized. The training utilized bfloat16 precision with specific learning rates of 4e-4 for SFT and 4e-5 for DPO phases.

  • Dynamic Mask Attention allows switching between self-attention (training) and state space (inference)
  • Cross Domain Mixture of Experts inherits weights from Multi-Layer Perceptron
  • 2048 token context length for SFT, 1024 for DPO
  • Batch size of 0.25M for SFT and 0.125M for DPO

Core Capabilities

  • Achieves 16.8% on IFEval Prompt Strict Accuracy
  • 29.7% performance on MMLU benchmark
  • 42.8% accuracy on ARC tasks
  • 64.1% on PIQA evaluations
  • Processing speed of 28 tokens/s on i7-11 CPU

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its Dynamic Mask Attention mechanism that enables flexible switching between attention modes and its efficient Cross Domain Mixture of Experts architecture, allowing for strong performance despite its relatively small size.

Q: What are the recommended use cases?

The model is particularly well-suited for instruction-following tasks, general language understanding, and applications requiring a balance between performance and computational efficiency. It's especially valuable in scenarios where resource constraints are important considerations.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.