sparsetral-16x7B-v2

sparsetral-16x7B-v2

serpdotai

A 9.39B parameter MoE-adapted Mistral model with 16 experts, trained on OpenHermes-2.5. Optimized for efficient instruction following and conversational tasks.

PropertyValue
Parameter Count9.39B
LicenseApache 2.0
Base ModelMistral-7B-Instruct-v0.2
Training DataOpenHermes-2.5
PaperView Research Paper

What is sparsetral-16x7B-v2?

sparsetral-16x7B-v2 is an advanced language model that implements Mixture-of-Experts (MoE) architecture with 16 experts on top of the Mistral-7B base model. It represents a significant advancement in parameter-efficient fine-tuning, utilizing QLoRA and MoE adapters to enhance performance while maintaining computational efficiency.

Implementation Details

The model was trained using a forked version of unsloth for efficient training, utilizing 8 A6000 GPUs. Key technical specifications include a sequence length of 4096, effective batch size of 128, and learning rate of 2e-5 with linear decay. The implementation features 16 experts with top-k routing (k=4) and adapter dimension of 512.

  • QLoRA training with rank 64 and alpha 16
  • MoE adapters and routers trained in bf16 format
  • Custom prompt format using im_start and im_end tokens
  • Optimized for 4096 token context window

Core Capabilities

  • Advanced text generation with improved parameter efficiency
  • Optimized for instruction-following tasks
  • Enhanced conversational abilities
  • Efficient routing between expert networks

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its parameter-efficient sparsity crafting approach, combining MoE architecture with QLoRA fine-tuning. This allows for enhanced performance while maintaining reasonable computational requirements.

Q: What are the recommended use cases?

The model is particularly well-suited for conversational AI applications, instruction-following tasks, and general text generation scenarios where efficient parameter usage is crucial.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026