Comparative Study of Deep Learning ArchitecturesforAutomated Diabetic Retinopathy Grading:Vision Transformer, Swin Transformer, and InceptionResNetV2
  • Author(s): Samir Mulla; Mahamadtohid Naikwadi; Prajwal Khandait; Aditya Sutar; Rajesh Kumar; Uma Gurav
  • Paper ID: 1718689
  • Page: 527-536
  • Published Date: 08-06-2026
  • Published In: Iconic Research And Engineering Journals
  • Publisher: IRE Journals
  • e-ISSN: 2456-8880
  • Volume/Issue: Volume 9 Issue 12 June-2026
Abstract

Diabetic Retinopathy (DR) is a vision-threatening complication of diabetes mellitus that progresses silently through five clinically defined severity grades. Timely automated screen-ing is critical to prevent irreversible vision loss, particu-larly in resource-constrained healthcare settings. This paper presents a systematic comparative study of three state-of-the-art deep learning architectures-Vision Transformer (ViT-Base/16), Swin Transformer (swin base patch4 window7 224), and InceptionResNetV2-applied to five-class DR grading on the APTOS 2019 fundus image dataset (3,662 images). All models employ transfer learning from ImageNet-pretrained weights. We analyze each architecture from the perspectives of classification accuracy, per-class F1-score, macro-averaged AUC, GradCAM-based explainability, training dynamics, and parameter efficiency. Our ViT-Base/16 model, fine-tuned end-to-end with AdamW, cosine annealing, and label smoothing, achieves the highest validation accuracy of 85.40% with a macro-averaged F1-score of 0.7247. Swin Transformer achieves 83.20% accuracy, while InceptionResNetV2 achieves 81.40% through two-stage transfer learning. GradCAM visualizations confirm clinically aligned lesion localization across all architectures. This work provides architectural insights for deploying robust DR screening systems in clinical environments.

Keywords

Diabetic Retinopathy, Vision Transformer, Swin Transformer, InceptionResNetV2, Transfer Learning, GradCAM, Fundus Image Classification, Deep Learning, Medical Image Analysis

Citations

IRE Journals:
Samir Mulla, Mahamadtohid Naikwadi, Prajwal Khandait, Aditya Sutar, Rajesh Kumar; Uma Gurav "Comparative Study of Deep Learning ArchitecturesforAutomated Diabetic Retinopathy Grading:Vision Transformer, Swin Transformer, and InceptionResNetV2" Iconic Research And Engineering Journals Volume 9 Issue 12 2026 Page 527-536 https://doi.org/10.64388/IREV9I12-1718689

IEEE:
Samir Mulla, Mahamadtohid Naikwadi, Prajwal Khandait, Aditya Sutar, Rajesh Kumar; Uma Gurav "Comparative Study of Deep Learning ArchitecturesforAutomated Diabetic Retinopathy Grading:Vision Transformer, Swin Transformer, and InceptionResNetV2" Iconic Research And Engineering Journals, 9(12) https://doi.org/10.64388/IREV9I12-1718689