Machine Learning-Based Prediction of Elevated Prostate-Specific Antigen Levels from Lifestyle and Demographic Data

Onitcha Nyerhovwo Edafetanure

doi:10.64388/IREV8I9-1714356

Home / Current Issue / Paper 1714356

1714356PublishedVol 8 · Issue 9

Machine Learning-Based Prediction of Elevated Prostate-Specific Antigen Levels from Lifestyle and Demographic Data

Onitcha Nyerhovwo Edafetanure

Subject area: Science,Engineering and Technology · Area of research: Machine Learning in Prostate Screening

DOI: https://doi.org/10.64388/IREV8I9-1714356

Abstract

Background: Machine learning offers promising approaches for medical prediction tasks. This study evaluates the comparative performance of three ML algorithms: Logistic Regression with L1 regularization, Support Vector Machine (SVM), and Random Forest in predicting elevated prostate-specific antigen (PSA) levels using lifestyle and demographic features. Objectives: To compare the predictive performance, generalization capability, and stability of multiple ML models for detecting elevated PSA levels via binary classification of PSA status. Methods: We implemented three ML algorithms with two feature selection approaches to address the events-per-variable (EPV) problem. Lifestyle and demographic data were collected from adult males in Etsako West LGA, Edo State, Nigeria. Models were trained on 70% of the data (n=69) and validated on 30% (n=30). Performance was assessed using accuracy, precision, recall, specificity, F1-score, and ROC-AUC. Five-fold stratified cross-validation evaluated model stability. Hyperparameter optimization was performed using GridSearchCV. Results: The Random Forest model achieved the most balanced performance with 73.3% accuracy, 70.6% precision, 80.0% recall, 63.6% specificity, 0.750 F1-score, and 0.764 ROC-AUC. SVM showed identical test set performance (73.3% accuracy, 0.764 ROC-AUC). Logistic Regression with L1 regularization and 11 features achieved the highest recall (100%) but at the cost of zero specificity, indicating overfitting. Cross-validation revealed model stability: Random Forest CV recall 0.833 ± 0.061, CV F1 0.813 ± 0.053. The F1-optimized Random Forest showed improved balance (70.0% accuracy, 66.7% recall, 50.0% specificity). All models demonstrated ROC-AUC between 0.714–0.764, indicating acceptable discrimination capability. Conclusions: Random Forest and SVM demonstrated the most balanced performance in terms of sensitivity and specificity for PSA prediction in a small-sample setting. The study highlights important methodological considerations, including the need for feature selection under EPV constraints, the role of regularization in mitigating overfitting, and the importance of cross-validation for evaluating models out of sample performance.The moderate performance (ROC-AUC ≈ 0.71–0.76) suggests that lifestyle-based ML models may be useful for preliminary screening but are not suitable for diagnostic applications. Future work should focus on larger datasets, external validation, and ensemble methods.

How to cite this paper

Onitcha Nyerhovwo Edafetanure "Machine Learning-Based Prediction of Elevated Prostate-Specific Antigen Levels from Lifestyle and Demographic Data" Iconic Research And Engineering Journals Volume 8 Issue 9 2025 Page 1857-1866 https://doi.org/10.64388/IREV8I9-1714356

Onitcha Nyerhovwo Edafetanure "Machine Learning-Based Prediction of Elevated Prostate-Specific Antigen Levels from Lifestyle and Demographic Data" Iconic Research And Engineering Journals, vol. 8, no. 9, Mar. 2025, doi: https://doi.org/10.64388/IREV8I9-1714356

Onitcha Nyerhovwo Edafetanure (2025). Machine Learning-Based Prediction of Elevated Prostate-Specific Antigen Levels from Lifestyle and Demographic Data. Iconic Research And Engineering Journals, 8(9). doi: https://doi.org/10.64388/IREV8I9-1714356

Onitcha Nyerhovwo Edafetanure "Machine Learning-Based Prediction of Elevated Prostate-Specific Antigen Levels from Lifestyle and Demographic Data" Iconic Research And Engineering Journals, vol. 8, no. 9, Mar. 2025. Crossref, https://doi.org/10.64388/IREV8I9-1714356

@article{1714356,
      author = {Onitcha Nyerhovwo Edafetanure},
      title = {Machine Learning-Based Prediction of Elevated Prostate-Specific Antigen Levels from Lifestyle and Demographic Data},
      journal = {Iconic Research And Engineering Journals},
      year = {2025},
      volume = {8},
      number = {9},
      pages = {1857-1866},
      issn = {2456-8880},
      url = {https://www.irejournals.com/formatedpaper/1714356.pdf},
      abstract = {Background: Machine learning offers promising approaches for medical prediction tasks. This study evaluates the comparative performance of three ML algorithms: Logistic Regression with L1 regularization, Support Vector Machine (SVM), and Random Forest in predicting elevated prostate-specific antigen (PSA) levels using lifestyle and demographic features.
Objectives: To compare the predictive performance, generalization capability, and stability of multiple ML models for detecting elevated PSA levels via binary classification of PSA status.
Methods: We implemented three ML algorithms with two feature selection approaches to address the events-per-variable (EPV) problem. Lifestyle and demographic data were collected from adult males in Etsako West LGA, Edo State, Nigeria. Models were trained on 70% of the data (n=69) and validated on 30% (n=30). Performance was assessed using accuracy, precision, recall, specificity, F1-score, and ROC-AUC. Five-fold stratified cross-validation evaluated model stability. Hyperparameter optimization was performed using GridSearchCV.
Results: The Random Forest model achieved the most balanced performance with 73.3% accuracy, 70.6% precision, 80.0% recall, 63.6% specificity, 0.750 F1-score, and 0.764 ROC-AUC. SVM showed identical test set performance (73.3% accuracy, 0.764 ROC-AUC). Logistic Regression with L1 regularization and 11 features achieved the highest recall (100%) but at the cost of zero specificity, indicating overfitting. Cross-validation revealed model stability: Random Forest CV recall 0.833 ± 0.061, CV F1 0.813 ± 0.053. The F1-optimized Random Forest showed improved balance (70.0% accuracy, 66.7% recall, 50.0% specificity). All models demonstrated ROC-AUC between 0.714–0.764, indicating acceptable discrimination capability.
Conclusions: Random Forest and SVM demonstrated the most balanced performance in terms of sensitivity and specificity for PSA prediction in a small-sample setting. The study highlights important methodological considerations, including the need for feature selection under EPV constraints, the role of regularization in mitigating overfitting, and the importance of cross-validation for evaluating models out of sample performance.The moderate performance (ROC-AUC ≈ 0.71–0.76) suggests that lifestyle-based ML models may be useful for preliminary screening but are not suitable for diagnostic applications. Future work should focus on larger datasets, external validation, and ensemble methods.},
      month = {March},
      doi = {https://doi.org/10.64388/IREV8I9-1714356}
  }