deberta-v3-base-prompt-injection-v2

Property	Value
Parameter Count	184M
License	Apache 2.0
Base Model	microsoft/deberta-v3-base
Accuracy	95.25%
F1 Score	95.49%

What is deberta-v3-base-prompt-injection-v2?

This is a specialized security model fine-tuned from DeBERTa-v3-base, designed to detect and classify prompt injection attacks in language models. It's trained on a comprehensive dataset combining multiple public sources and achieves impressive accuracy in identifying malicious prompt manipulations.

Implementation Details

The model represents a significant advancement in LLM security, utilizing the DeBERTa-v3 architecture with 184M parameters. It processes inputs as binary classifications: benign (0) or injection-detected (1), and has been optimized through testing of over 20 different configurations.

Trained on 7 diverse datasets including VMware/open-instruct and Huggingface data
Supports both Transformers and ONNX runtime implementations
Maximum sequence length of 512 tokens
Includes Langchain and LLM Guard integration options

Core Capabilities

Binary classification of prompt injection attempts
High precision (91.59%) and recall (99.74%)
Optimized for English language inputs
Compatible with major ML frameworks

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized focus on prompt injection detection, achieving over 95% accuracy on unseen data, and its comprehensive training on diverse datasets including real-world attack patterns.

Q: What are the recommended use cases?

The model is ideal for securing LLM applications, particularly for detecting malicious prompt modifications in production environments. However, it's not recommended for system prompts due to potential false positives.