Published
Dec 14, 2024
Updated
Dec 14, 2024

Can AI Label Cybersecurity Threats Effectively?

Labeling NIDS Rules with MITRE ATT&CK Techniques: Machine Learning vs. Large Language Models
By
Nir Daniel|Florian Klaus Kaiser|Shay Giladi|Sapir Sharabi|Raz Moyal|Shalev Shpolyansky|Andres Murillo|Aviad Elyashar|Rami Puzis

Summary

Security analysts face an overwhelming deluge of alerts from Network Intrusion Detection Systems (NIDS) like Snort. Many of these alerts lack context, making it tough to distinguish real threats from noise. Could AI help connect these alerts to real-world attack techniques? Researchers explored this question by testing the ability of Large Language Models (LLMs) like ChatGPT, Claude, and Gemini to label NIDS rules with tactics and techniques from the MITRE ATT&CK framework, a knowledge base of adversary behavior. The results revealed a fascinating interplay between the strengths of LLMs and traditional Machine Learning (ML). While LLMs offered explainable and scalable initial mappings, suggesting their potential for generating hypotheses and aiding less experienced analysts, traditional ML models trained on labeled data consistently achieved higher precision and recall. Specifically, an SVM model trained by Gemini achieved the highest F1-score of 0.87 for technique labeling and 0.92 for tactic labeling. This suggests a future where human analysts use LLMs to get a first look at complex alerts and then rely on finely-tuned ML models for accurate classification within critical systems. The research highlights the potential of a hybrid approach, combining the reasoning of LLMs with the precision of ML, to strengthen our defenses against ever-evolving cyber threats. This offers a glimpse into a future where AI-powered tools empower security teams to react faster and more effectively to attacks.
🍰 Interesting in building your own agents?
PromptLayer provides the tools to manage and monitor prompts with your whole team. Get started for free.

Question & Answers

How does the hybrid AI approach combine LLMs and traditional ML for cybersecurity threat detection?
The hybrid approach uses LLMs (like ChatGPT and Claude) for initial alert analysis and hypothesis generation, while leveraging trained ML models (specifically SVM) for precise classification. The process works in two stages: First, LLMs provide explainable, contextual mapping of alerts to MITRE ATT&CK frameworks, helping analysts understand potential threats. Then, traditional ML models, which achieved F1-scores of 0.87 for technique labeling and 0.92 for tactic labeling, provide accurate classification. This combination allows security teams to benefit from both LLMs' reasoning capabilities and ML's precision in critical systems.
What are the main benefits of AI-powered cybersecurity tools for businesses?
AI-powered cybersecurity tools offer three key benefits for businesses: First, they help manage the overwhelming volume of security alerts by automatically filtering and categorizing potential threats. Second, they provide faster threat detection and response times compared to manual analysis, reducing the risk of successful cyberattacks. Finally, these tools can help less experienced security analysts make better decisions by providing context and initial threat assessments. This makes cybersecurity more accessible and efficient for organizations of all sizes, ultimately improving their security posture.
How is artificial intelligence improving threat detection in everyday security systems?
Artificial intelligence is revolutionizing everyday security systems by automating the analysis of potential threats and providing more accurate detection capabilities. In practical terms, AI helps security systems distinguish between genuine threats and false alarms, reducing unnecessary alerts while catching real security issues. For example, in home security systems, AI can differentiate between a burglar and a pet moving around, or in corporate networks, it can identify unusual patterns that might indicate a cyber attack. This makes security systems more reliable and effective while requiring less human intervention.

PromptLayer Features

  1. Testing & Evaluation
  2. The paper's comparison of LLM vs ML model performance aligns with PromptLayer's testing capabilities for measuring prompt effectiveness
Implementation Details
Set up A/B testing between different LLM prompts for NIDS rule classification, track performance metrics, and compare against ML model baselines
Key Benefits
• Quantitative performance tracking across different prompt versions • Systematic evaluation of prompt effectiveness for security classifications • Data-driven optimization of prompt engineering
Potential Improvements
• Add security-specific evaluation metrics • Implement automated regression testing for prompt updates • Create specialized scoring methods for security context
Business Value
Efficiency Gains
Reduces time spent manually evaluating prompt effectiveness
Cost Savings
Minimizes API costs through optimized prompt selection
Quality Improvement
Ensures consistent and reliable threat classification performance
  1. Workflow Management
  2. The hybrid LLM-ML approach suggests need for orchestrated workflows combining initial LLM analysis with ML validation
Implementation Details
Create multi-step templates that process NIDS rules through LLM classification first, then ML validation, tracking results at each stage
Key Benefits
• Reproducible security analysis pipelines • Version-controlled prompt templates • Automated workflow orchestration
Potential Improvements
• Add parallel processing capabilities • Implement feedback loops for continuous improvement • Create specialized security workflow templates
Business Value
Efficiency Gains
Streamlines complex multi-step security analysis processes
Cost Savings
Reduces manual intervention in analysis workflows
Quality Improvement
Ensures consistent application of both LLM and ML analysis steps

The first platform built for prompt engineering