A Robust and Explainable Transformer-Based Framework for Phishing Email Detection
Title: A Robust and Explainable Transformer-Based Framework for Phishing Email Detection
Abstract:
As cyber threats evolve in complexity, email-based phishing persists as the most enduring attack vector. These malicious campaigns capitalize on human weaknesses to distribute malware or secure illicit access to confidential data. While transformer-based architectures improve detection through superior contextual language comprehension, their "black box" nature—stemming from a lack of interpretability—remains a significant hurdle. Furthermore, the emergence of AI-driven attacks poses new challenges to model stability.
To overcome these issues, this study introduces a streamlined phishing detection system built upon DistilBERT, a lightweight variant of the Transformer architecture. The framework’s resilience against input noise at the character level and perturbations at the embedding level is strengthened by employing stochastic character-level disruptions alongside gradient-based adversarial training via the Fast Gradient Method (FGM).
To enhance transparency, the system incorporates three leading Explainable AI (XAI) techniques: LIME (Local Interpretable Model-agnostic Explanations), SHAP (SHapley Additive exPlanations), and IG (Integrated Gradients). These tools are utilized to decode the model’s decision-making processes. Additionally, a structured, rule-based prompt synthesizes both the model’s predictions and XAI-derived features to direct Flan-T5-Small in producing clear, evidence-backed explanations in plain language.
Experimental findings indicate that this proposed framework surpasses a conventional DistilBERT detection model—trained without robustness enhancements—in both accuracy and resilience. By merging reliability with interpretability, this integrated strategy aims to close the divide between model performance and user confidence, thereby promoting more transparent phishing detection mechanisms.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



