deberta-v3-base-prompt-injection

Property	Value
Parameter Count	184M
License	Apache 2.0
Base Model	microsoft/deberta-v3-base
Training Metrics	Accuracy: 0.9999, F1: 0.9998
Language	English

What is deberta-v3-base-prompt-injection?

This is a specialized model fine-tuned from DeBERTa-v3-base specifically for detecting prompt injection attacks in AI systems. It processes input text and classifies it into two categories: safe (0) or containing injection attempts (1). The model has been trained on a diverse dataset combining multiple prompt injection sources and legitimate prompts, with a 30/70 split between malicious and benign content.

Implementation Details

The model utilizes the DeBERTa-v3 architecture and was trained using carefully selected hyperparameters including a learning rate of 2e-05, batch size of 8, and linear scheduler with 500 warmup steps. Training was conducted over 3 epochs, resulting in exceptional performance metrics.

Trained on 12 different datasets including Lakera/gandalf_ignore_instructions and ChatGPT-Jailbreak-Prompts
Implements both standard Transformers pipeline and optimized ONNX runtime support
Compatible with LangChain and LLM Guard integration

Core Capabilities

Binary classification of prompt injection attempts
Handle sequences up to 512 tokens
GPU-compatible with CUDA support
Achieves 99.99% accuracy on evaluation set

Frequently Asked Questions

Q: What makes this model unique?

The model combines state-of-the-art DeBERTa architecture with comprehensive training on diverse prompt injection datasets, achieving near-perfect accuracy while maintaining practical deployment capabilities through ONNX optimization.

Q: What are the recommended use cases?

The model is ideal for security implementations in LLM systems, API endpoints, and chat interfaces where detecting and preventing prompt injection attacks is crucial. It can be integrated as a pre-processing step in production environments.