Current Volume 9
This study undertakes a comparative evaluation of Random Forest (RF) and XGBoost for daily Air Quality Index (AQI) forecasting across five major Indian metropolitan cities—Delhi, Mumbai, Kolkata, Chennai and Bangalore—spanning 2021 to 2025. The dataset comprises 7,605 city-day observations drawn from the CPCB monitoring archive, enriched with temporal lag features, rolling window statistics and cyclical calendar encodings. Both models were trained on an 80/20 temporal split and evaluated using RMSE, MAE and R². RF demonstrated superior predictive accuracy across all five cities, yielding an overall R²=0.6565 versus 0.6390 for XGBoost, with peak performance in Mumbai (R²=0.8627) and Delhi (R²=0.8516). Feature importance analysis confirmed the primacy of lagged AQI values (AQI_lag1: 86.34% of RF importance), underscoring the strongly autoregressive nature of urban air quality dynamics. Seasonal analysis identified Winter as the highest-pollution season (mean AQI=161.75) and the most challenging for accurate prediction. Findings provide actionable guidance for data-driven early-warning systems in emerging economy urban contexts.
Air Quality Index; Random Forest; XGBoost; Urban Air Quality Forecasting; Machine Learning; India; Emerging Economies
IRE Journals:
Moksha Vora, Dr. Syed Shahid Raza "Comparative Evaluation of Machine Learning Models for Urban Air Quality Forecasting in Emerging Economies" Iconic Research And Engineering Journals Volume 9 Issue 11 2026 Page 1190-1196 https://doi.org/10.64388/IREV9I11-1717704
IEEE:
Moksha Vora, Dr. Syed Shahid Raza
"Comparative Evaluation of Machine Learning Models for Urban Air Quality Forecasting in Emerging Economies" Iconic Research And Engineering Journals, 9(11) https://doi.org/10.64388/IREV9I11-1717704