Phishing is still a significant and evolving cybersecurity threat for businesses as these types of attacks can often bypass rules and regulatory-based filters. Although there have been recent advances in deep learning techniques that provide a high accuracy in detection rates, there continues to be significant issues with the "black box" nature of these techniques resulting in a lack of interpretability, therefore making them difficult to deploy in a trustworthy manner within the cybersecurity environment. Therefore, this paper presents a new data-driven phishing detection framework that balances high detection accuracy and provides feature interpretability. Using the UCI Machine Learning Repository dataset, we evaluated multiple supervised learning algorithms such as: Logistic Regression, Support Vector Machines (SVM), and Random Forest and incorporated Synthetic Minority Over Sampling Technique (SMOTE) to correct for class imbalance. Our evaluations show that the Random Forest classifier provides the best performance with an accuracy of 95.5% and ROC-AUC score of 0.97, outperforming the other models that were evaluated. More importantly, we extend the analysis of phishing beyond a binary classification and report insight into the importance of features through the analysis of the Random Forest classifier. Specifically, our findings indicate that indicators such as Having_IP_Address and URL_Length are strongly predictive of phishing versus legitimate intent. Our research suggests that lightweight interpretable ensemble models can be developed that are scalable and transparent alternatives to large complex neural networks for real-time phishing detection.
IRE Journals:
Digvijay Purkayastha "Phishing Detection Using Machine Learning: A Data-Driven Approach to Enhancing Cybersecurity Awareness" Iconic Research And Engineering Journals Volume 9 Issue 8 2026 Page 1216-1220 https://doi.org/10.64388/IREV9I8-1714451
IEEE:
Digvijay Purkayastha
"Phishing Detection Using Machine Learning: A Data-Driven Approach to Enhancing Cybersecurity Awareness" Iconic Research And Engineering Journals, 9(8) https://doi.org/10.64388/IREV9I8-1714451