The performance of Artificial Intelligence (AI) systems and especially those relying on deep learning has been impressive in the breadth of their application in computer vision, natural language processing, healthcare, and autonomous system domains. Nevertheless, their increased reliance on neural networks of large scale has left them vulnerable to a fatal attack--adversarial attacks. These attacks include the purposeful control of the input data by well-crafted, and typically invisible, perturbations that trigger AI models to make erroneous predictions or classifications. These counter-examples cast severe doubts on the resilience, stability, and safety of the AI-based technologies, with respect to safety-related purposes like self-driving cars, biometric authentication, and medical diagnosis. In this paper, the approach to adversarial attacks is described in detail, and they are divided into white-box, black-box, and gray-box models, depending on the knowledge of the attacker of the target system. It also discusses the various attack approaches including Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD) and optimization-based approaches which exploit the sensitivities and gradients of models. Simultaneously, the paper examines the current defense mechanisms to enhance AI resilience, such as adversarial training, defensive distillation, input transformation, gradient masking, and certified robustness methods. Regardless of the great advances, the majority of defense mechanisms have a low scalability, generalization, or computing efficiency, creating a continuous arms race between the generation of adversarial attacks and defense of models. The paper ends by covering areas where the research is moving including explainable AI, robust optimization, and the incorporation of security-by-design concepts in the development of neural networks. To make AI models more trustworthy, transparent and safe to deploy in the real world, it is necessary to strengthen them against adversarial manipulation.
IRE Journals:
Geetha Aradhyula "Adversarial Attacks and Defense Mechanisms in AI" Iconic Research And Engineering Journals Volume 8 Issue 5 2024 Page 1453-1461 https://doi.org/10.64388/IREV9I5-1711957
IEEE:
Geetha Aradhyula
"Adversarial Attacks and Defense Mechanisms in AI" Iconic Research And Engineering Journals, 8(5) https://doi.org/10.64388/IREV9I5-1711957