Applied Sciences (Jun 2022)

A Machine Learning and Radiomics Approach in Lung Cancer for Predicting Histological Subtype

  • Antonio Brunetti,
  • Nicola Altini,
  • Domenico Buongiorno,
  • Emilio Garolla,
  • Fabio Corallo,
  • Matteo Gravina,
  • Vitoantonio Bevilacqua,
  • Berardino Prencipe

DOI
https://doi.org/10.3390/app12125829
Journal volume & issue
Vol. 12, no. 12
p. 5829

Abstract

Read online

Lung cancer is one of the deadliest diseases worldwide. Computed Tomography (CT) images are a powerful tool for investigating the structure and texture of lung nodules. For a long time, trained radiologists have performed the grading and staging of cancer severity by relying on radiographic images. Recently, radiomics has been changing the traditional workflow for lung cancer staging by providing the technical and methodological means to analytically quantify lesions so that more accurate predictions could be performed while reducing the time required from each specialist to perform such tasks. In this work, we implemented a pipeline for identifying a radiomic signature composed of a reduced number of features to discriminate between adenocarcinomas and other cancer types. In addition, we also investigated the reproducibility of this radiomic study analysing the performances of the classification models on external validation data. In detail, we first considered two publicly available datasets, namely D1 and D2, composed of n = 262 and n = 89 samples, respectively. Ten significant features, according to univariate AUC evaluated on D1, were retained. Mann–Whitney U tests recognised three of these features to have a statistically different distribution, with a p-value n = 51 CT images from patients with lung nodules at the Azienda Ospedaliero—Universitaria “Policlinico Riuniti” in Foggia. Resident radiologists manually annotated the lung lesions in images to allow the subsequent analysis of the malignancy regions. We designed a pipeline for feature extraction from the Volumes of Interest in order to generate a third dataset, i.e., D3. Several experiments have been performed showing that the selected radiomic signature not only allowed the discrimination of lung adenocarcinoma from other cancer types independently from the input dataset used for training the models, but also allowed reaching good classification performances also on external validation data; in fact, the radiomic signature computed on D1 and evaluated on the local cohort allowed reaching an AUC of 0.70 (p0.001) for the task of predicting the histological subtype.

Keywords