deberta-v3-base-prompt-injection-v2
Property | Value |
---|---|
Parameter Count | 184M |
License | Apache 2.0 |
Base Model | microsoft/deberta-v3-base |
Accuracy | 95.25% |
F1 Score | 95.49% |
What is deberta-v3-base-prompt-injection-v2?
This is a specialized security model fine-tuned from DeBERTa-v3-base, designed to detect and classify prompt injection attacks in language models. It's trained on a comprehensive dataset combining multiple public sources and achieves impressive accuracy in identifying malicious prompt manipulations.
Implementation Details
The model represents a significant advancement in LLM security, utilizing the DeBERTa-v3 architecture with 184M parameters. It processes inputs as binary classifications: benign (0) or injection-detected (1), and has been optimized through testing of over 20 different configurations.
- Trained on 7 diverse datasets including VMware/open-instruct and Huggingface data
- Supports both Transformers and ONNX runtime implementations
- Maximum sequence length of 512 tokens
- Includes Langchain and LLM Guard integration options
Core Capabilities
- Binary classification of prompt injection attempts
- High precision (91.59%) and recall (99.74%)
- Optimized for English language inputs
- Compatible with major ML frameworks
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized focus on prompt injection detection, achieving over 95% accuracy on unseen data, and its comprehensive training on diverse datasets including real-world attack patterns.
Q: What are the recommended use cases?
The model is ideal for securing LLM applications, particularly for detecting malicious prompt modifications in production environments. However, it's not recommended for system prompts due to potential false positives.