Qwen2.5-MOE-6x1.5B-DeepSeek-Reasoning-e32-8.71B-gguf

Qwen2.5-MOE-6x1.5B-DeepSeek-Reasoning-e32-8.71B-gguf

DavidAU

Experimental 8.71B parameter MOE model combining 6 Qwen-1.5B models with DeepSeek reasoning capabilities. Features 128k context and uncensored output.

PropertyValue
Total Parameters8.71B
Base ArchitectureMixture of Experts (MOE)
Context Length128k tokens
QuantizationsQ4_K_S, Q8_0

What is Qwen2.5-MOE-6x1.5B-DeepSeek-Reasoning-e32-8.71B-gguf?

This is an experimental Mixture of Experts (MOE) model that combines six Qwen 2.5 1.5B models into a unified 8.71B parameter model. It features specialized reasoning capabilities through the integration of DeepSeek technology and supports an extensive 128k token context window. The model incorporates one captain/controller model with a shared expert and five main expert models, creating a sophisticated neural network architecture.

Implementation Details

The model architecture consists of multiple specialized components working in concert. The captain/controller (with .01 shared expert) utilizes DeepSeek-R1-Distill-Qwen-1.5B-uncensored, while the main experts include various fine-tuned versions of Qwen and DeepSeek models. By default, it activates 4 out of 6 experts, though users can enable all 6 for optimal performance.

  • Float32 mastering for improved generation quality
  • Supports multiple templating systems including Jinja, Llama 3, and Chatml
  • Recommended temperature range of 0.4 to 0.8
  • Minimum context setting of 4k tokens, with 8k+ recommended

Core Capabilities

  • Advanced reasoning and analytical processing
  • Uncensored output generation
  • Multi-step thinking and problem decomposition
  • Flexible template support
  • Extended context handling up to 128k tokens

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its MOE architecture that combines six specialized 1.5B parameter models, each contributing different expertise while maintaining a relatively small total parameter count. This allows for sophisticated reasoning capabilities typically associated with much larger models.

Q: What are the recommended use cases?

The model is versatile and can be used for general-purpose tasks, but particularly excels in scenarios requiring complex reasoning, multi-step thinking, and detailed analysis. It's important to note that performance can be variable, ranging from basic to exceptional depending on the specific task and configuration.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026