Doge-160M-Instruct

Property	Value
Parameter Count	160M
Model Type	Instruction-tuned Language Model
Architecture	Dynamic Mask Attention with Cross Domain MoE
Paper	Wonderful Matrices (2024)
Training Data	SmolTalk (SFT), UltraFeedback Binarized (DPO)

What is Doge-160M-Instruct?

Doge-160M-Instruct is an innovative language model that combines Dynamic Mask Attention and Cross Domain Mixture of Experts to achieve efficient performance. It represents a significant advancement in compact language models, achieving impressive results across various benchmarks while maintaining computational efficiency.

Implementation Details

The model was developed through a two-stage training process: Supervised Fine-Tuning (SFT) on SmolTalk dataset followed by Direct Preference Optimization (DPO) on UltraFeedback Binarized. The training utilized bfloat16 precision with specific learning rates of 4e-4 for SFT and 4e-5 for DPO phases.

Dynamic Mask Attention allows switching between self-attention (training) and state space (inference)
Cross Domain Mixture of Experts inherits weights from Multi-Layer Perceptron
2048 token context length for SFT, 1024 for DPO
Batch size of 0.25M for SFT and 0.125M for DPO

Core Capabilities

Achieves 16.8% on IFEval Prompt Strict Accuracy
29.7% performance on MMLU benchmark
42.8% accuracy on ARC tasks
64.1% on PIQA evaluations
Processing speed of 28 tokens/s on i7-11 CPU

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its Dynamic Mask Attention mechanism that enables flexible switching between attention modes and its efficient Cross Domain Mixture of Experts architecture, allowing for strong performance despite its relatively small size.

Q: What are the recommended use cases?

The model is particularly well-suited for instruction-following tasks, general language understanding, and applications requiring a balance between performance and computational efficiency. It's especially valuable in scenarios where resource constraints are important considerations.