final-complete-malicious-url-model

r3ddkahili

BERT-LoRA model for malicious URL detection with 98% accuracy. Classifies URLs into benign, defacement, phishing, and malware categories. 110M parameters.

Property	Value
Model Type	BERT-based URL Classifier with LoRA
Parameters	110M
Accuracy	98%
F1 Score	0.965
Author	r3ddkahili
Model URL	https://huggingface.co/r3ddkahili/final-complete-malicious-url-model

What is final-complete-malicious-url-model?

This is a specialized BERT-based model fine-tuned using Low-Rank Adaptation (LoRA) for detecting malicious URLs in real-time. Built on bert-base-uncased, it can classify URLs into four categories: benign, defacement, phishing, and malware. The model was trained on the Kaggle Malicious URLs Dataset containing approximately 651,191 samples.

Implementation Details

The model utilizes the Hugging Face Transformers library with PyTorch backend and PEFT for efficient fine-tuning. It processes URLs with a maximum sequence length of 128 tokens and was trained using the AdamW optimizer with weight decay and weighted cross-entropy loss.

Batch Size: 16 with 5 training epochs
Learning Rate: 2e-5
Evaluation Strategy: Epoch-based
Fine-Tuning: LoRA applied to BERT layers

Core Capabilities

Real-time URL classification with 98% validation accuracy
Category-specific performance: Benign (F1: 0.985), Defacement (F1: 0.985), Phishing (F1: 0.935), Malware (F1: 0.955)
Integration capabilities with browser extensions and security tools
Suitable for SOC (Security Operations Centers) implementation

Frequently Asked Questions

Q: What makes this model unique?

The model combines BERT's powerful language understanding capabilities with LoRA fine-tuning, achieving high accuracy while maintaining computational efficiency. Its ability to distinguish between different types of threats makes it particularly valuable for cybersecurity applications.

Q: What are the recommended use cases?

The model is ideal for real-time URL classification in cybersecurity tools, browser extensions for instant threat alerts, phishing detection systems, and security monitoring in SOCs. It can be deployed via Streamlit web app, browser extension, or REST API integration.