deberta-v3-base-prompt-injection
Property | Value |
---|---|
Parameter Count | 184M |
License | Apache 2.0 |
Base Model | microsoft/deberta-v3-base |
Training Metrics | Accuracy: 0.9999, F1: 0.9998 |
Language | English |
What is deberta-v3-base-prompt-injection?
This is a specialized model fine-tuned from DeBERTa-v3-base specifically for detecting prompt injection attacks in AI systems. It processes input text and classifies it into two categories: safe (0) or containing injection attempts (1). The model has been trained on a diverse dataset combining multiple prompt injection sources and legitimate prompts, with a 30/70 split between malicious and benign content.
Implementation Details
The model utilizes the DeBERTa-v3 architecture and was trained using carefully selected hyperparameters including a learning rate of 2e-05, batch size of 8, and linear scheduler with 500 warmup steps. Training was conducted over 3 epochs, resulting in exceptional performance metrics.
- Trained on 12 different datasets including Lakera/gandalf_ignore_instructions and ChatGPT-Jailbreak-Prompts
- Implements both standard Transformers pipeline and optimized ONNX runtime support
- Compatible with LangChain and LLM Guard integration
Core Capabilities
- Binary classification of prompt injection attempts
- Handle sequences up to 512 tokens
- GPU-compatible with CUDA support
- Achieves 99.99% accuracy on evaluation set
Frequently Asked Questions
Q: What makes this model unique?
The model combines state-of-the-art DeBERTa architecture with comprehensive training on diverse prompt injection datasets, achieving near-perfect accuracy while maintaining practical deployment capabilities through ONNX optimization.
Q: What are the recommended use cases?
The model is ideal for security implementations in LLM systems, API endpoints, and chat interfaces where detecting and preventing prompt injection attacks is crucial. It can be integrated as a pre-processing step in production environments.