Frontiers in Oncology (Apr 2023)

Prediction of gastrointestinal cancers in the ONCONUT cohort study: comparison between logistic regression and artificial neural network

  • Rossella Donghia,
  • Vito Guerra,
  • Giovanni Misciagna,
  • Carmine Loiacono,
  • Antonio Brunetti,
  • Vitoantonio Bevilacqua

DOI
https://doi.org/10.3389/fonc.2023.1110999
Journal volume & issue
Vol. 13

Abstract

Read online

BackgroundArtificial neural networks (ANNs) and logistic regression (LR) are the models of chosen in many medical data classification tasks. Several published articles were based on summarizing the differences and similarities of these models from a technical point of view and critically assessing the quality of the models. The aim of this study was to compare ANN and LR the statistical techniques to predict gastrointestinal cancer in an elderly cohort in Southern Italy (ONCONUT study).MethodIn 1992, ONCONUT was started with the aim of evaluating the relationship between diet and cancer development in a Southern Italian elderly population. Patients with gastrointestinal cancer (ICD-10 from 150.0 to 159.9) were included in the study (n = 3,545).ResultsThis cohort was used to train and test the ANN and LR. LR was evaluated separately for macro- and micronutrients, and the accuracy was evaluated based on true positives and true negatives versus the total (97.15%). Then, ANN was trained and the accuracy was evaluated (96.61% for macronutrients and 97.06% for micronutrients). To further investigate the classification capabilities of ANN, k-fold cross-validation and genetic algorithm (GA) were used after balancing the dataset among classes.ConclusionsBoth LR and ANN had high accuracy and similar performance. Both models had the potential to be used as decision clinical support integrated into clinical practice, because in many circumstances, the use of a simple LR model was likely to be adequate for real-world needs, but in others in which there were large amounts of data, the application of advanced analytic tools such as ANNs could be indicated, and the GA optimizer needed to optimize the accuracy of ANN.

Keywords