distilbart-mnli-12-9

distilbart-mnli-12-9

valhalla

Distilled version of BART-MNLI with 12 encoder and 9 decoder layers, achieving 89.56% matched accuracy while maintaining high performance through selective layer copying.

PropertyValue
Authorvalhalla
Model TypeDistilled Language Model
Architecture12 encoder layers, 9 decoder layers
Base ModelBART-large-MNLI
Model LinkHugging Face

What is distilbart-mnli-12-9?

DistilBART-MNLI-12-9 is a carefully optimized distilled version of the BART-large-MNLI model, created using the No Teacher Distillation technique. This model maintains impressive performance while reducing computational requirements through strategic layer selection. It achieves 89.56% matched accuracy and 89.52% mismatched accuracy, coming remarkably close to its larger parent model's performance.

Implementation Details

The model employs a unique distillation approach where alternating layers from the original BART-large-MNLI are selectively copied and then fine-tuned. This implementation retains 12 encoder layers and 9 decoder layers, striking an optimal balance between model size and performance.

  • Utilizes No Teacher Distillation technique
  • Maintains high performance with minimal accuracy drop (less than 0.4% from original)
  • Efficiently preserves model capabilities through strategic layer selection
  • Supports fine-tuning on MNLI tasks

Core Capabilities

  • Natural Language Inference tasks
  • High accuracy on both matched (89.56%) and mismatched (89.52%) datasets
  • Efficient processing with reduced parameter count
  • Compatible with standard fine-tuning procedures

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to maintain near-original performance while reducing complexity through strategic layer selection and the No Teacher Distillation technique. It achieves this with minimal performance degradation compared to the original BART-large-MNLI model.

Q: What are the recommended use cases?

This model is particularly well-suited for Natural Language Inference tasks where computational efficiency is important. It's ideal for applications requiring high-quality inference capabilities while maintaining reasonable resource requirements.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026