ModernBERT-large-zeroshot-v2.0

Property	Value
Author	MoritzLaurer
Model Type	Zero-shot Text Classification
Base Architecture	ModernBERT-large
Average Accuracy	85%
Model Link	Hugging Face

What is ModernBERT-large-zeroshot-v2.0?

ModernBERT-large-zeroshot-v2.0 is an advanced language model specifically fine-tuned for zero-shot classification tasks. Built on answerdotai's ModernBERT-large architecture, this model stands out for its exceptional speed and memory efficiency compared to alternatives like DeBERTav3. It features an impressive 8k context window and achieves strong performance across various classification tasks.

Implementation Details

The model was trained using carefully selected hyperparameters, including a learning rate of 9e-06, batch size of 32, and linear learning rate scheduling with a 6% warmup ratio. Training was conducted over 2 epochs using the AdamW optimizer. Notable technical achievements include bf16 precision support, which delivers approximately 2x speed improvement over fp16.

Processes 1116 texts per second on an A100 40GB GPU
Achieves 85% mean accuracy across diverse tasks
Excels in sentiment analysis with 96.4% accuracy on Amazon Polarity
Superior performance on NLI tasks (94.2% on MNLI matched)

Core Capabilities

Zero-shot text classification across multiple domains
Efficient handling of long text sequences (8k context window)
High-speed inference with optimal memory usage
Robust performance on sentiment analysis, topic classification, and hate speech detection

Frequently Asked Questions

Q: What makes this model unique?

This model combines exceptional speed and memory efficiency with strong classification performance. It processes text multiple times faster than DeBERTav3 while maintaining competitive accuracy levels.

Q: What are the recommended use cases?

The model is ideal for zero-shot classification tasks, particularly in scenarios requiring high-speed processing of large text volumes. It performs especially well in sentiment analysis, topic classification, and natural language inference tasks.