A Lightweight Hybrid MLP-Based Framework for Real-Time Phishing URL Detection Using Structural URL Features
Title: An Efficient Hybrid MLP Architecture for Real-Time Phishing Detection Leveraging URL Structure
Phishing campaigns continue to pose a significant risk to cybersecurity by utilizing deceptive web addresses to compromise user data. Conventional detection methods, which rely on blacklists or static rules, are inherently reactive and frequently struggle to detect novel phishing attempts. To address this limitation, this study introduces a lightweight, hybrid system designed for real-time identification of malicious URLs. This approach integrates blacklist filtering with a Multi-Layer Perceptron (MLP) classifier that analyzes only the structural attributes of the URL.
The proposed framework isolates 16 distinct features derived from the URL structure, encompassing domain-specific and security-related metrics. Notably, this process does not necessitate access to webpage content, external APIs, or visual rendering, thereby ensuring high computational efficiency suitable for immediate deployment. The model was developed and tested using the PhiUSIIL dataset, which comprises 235,795 labeled URLs.
Performance metrics demonstrate that the MLP classifier attained an accuracy of 99.24%, with precision at 98.74%, recall at 99.95%, an F1-score of 99.34%, and a ROC-AUC of 99.65%. These figures surpass those of comparative models, including Random Forest, Logistic Regression, XGBoost, LightGBM, and CatBoost, when subjected to identical evaluation conditions. In terms of speed, the hybrid architecture processed URLs with an average inference time of 1.2 milliseconds, achieving a peak throughput of 4,200 URLs per second during concurrent operations.
Additionally, a functional desktop prototype named CyberGuard was developed to illustrate the practical applicability of the system. The findings suggest that this framework offers a highly accurate and resource-efficient method for detecting phishing URLs in real-time, particularly within environments with limited computational resources.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





