deberta-v3-base-injection

Maintained By
deepset

deberta-v3-base-injection

PropertyValue
Parameter Count184M
LicenseMIT
LanguagesEnglish, German
Training Accuracy99.14%

What is deberta-v3-base-injection?

deberta-v3-base-injection is a specialized model fine-tuned from Microsoft's DeBERTa-v3-base architecture for detecting prompt injection attempts in text. Developed by deepset, this model serves as a security tool to identify potentially malicious prompt manipulation attempts, classifying inputs as either "INJECTION" or "LEGIT".

Implementation Details

The model was trained using the prompt-injection dataset with careful consideration of hyperparameters, including a learning rate of 2e-05 and Adam optimizer. Training was conducted over 3 epochs with a batch size of 8, achieving impressive final validation metrics.

  • Built on Microsoft's DeBERTa-v3-base architecture
  • Trained using PyTorch 2.0.0 and Transformers 4.29.1
  • Implements safetensors for secure model storage
  • Achieves 99.14% accuracy on evaluation set

Core Capabilities

  • Binary classification of text inputs (INJECTION vs LEGIT)
  • Supports both English and German language processing
  • Optimized for security applications in AI systems
  • Highly accurate detection of prompt injection attempts

Frequently Asked Questions

Q: What makes this model unique?

This model specializes in detecting prompt injection attempts with extremely high accuracy (99.14%), making it valuable for securing AI systems against manipulation attempts. Its bilingual capability and foundation on the robust DeBERTa-v3 architecture set it apart from similar security models.

Q: What are the recommended use cases?

The model is ideal for securing AI systems, chatbots, and language models against prompt injection attacks. It can be used as a preprocessing step to filter potentially malicious inputs, though users can retrain it with custom legitimate examples if needed to reduce false positives.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.