Scientific Reports (Apr 2024)

Enhancing diagnosis of benign lesions and lung cancer through ensemble text and breath analysis: a retrospective cohort study

  • Hao Wang,
  • Yinghua Wu,
  • Meixiu Sun,
  • Xiaonan Cui

DOI
https://doi.org/10.1038/s41598-024-59474-w
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Early diagnosis of lung cancer (LC) can significantly reduce its mortality rate. Considering the limitations of the high false positive rate and reliance on radiologists’ experience in computed tomography (CT)–based diagnosis, a multi-modal early LC screening model that combines radiology with other non-invasive, rapid detection methods is warranted. A high-resolution, multi-modal, and low-differentiation LC screening strategy named ensemble text and breath analysis (ETBA) is proposed that ensembles radiology report text analysis and breath analysis. In total, 231 samples (140 LC patients and 91 benign lesions [BL] patients) were screened using proton transfer reaction–time of flight–mass spectrometry and CT screening. Participants were randomly assigned to a training set and a validation set (4:1) with stratification. The report section of the radiology reports was used to train a text analysis (TA) model with a natural language processing algorithm. Twenty-two volatile organic compounds (VOCs) in the exhaled breath and the prediction results of the TA model were used as predictors to develop the ETBA model using an extreme gradient boosting algorithm. A breath analysis model was developed based on the 22 VOCs. The BA and TA models were compared with the ETBA model. The ETBA model achieved a sensitivity of 94.3%, a specificity of 77.3%, and an accuracy of 87.7% with the validation set. The radiologist diagnosis performance with the validation set had a sensitivity of 74.3%, a specificity of 59.1%, and an accuracy of 68.1%. High sensitivity and specificity were obtained by the ETBA model compared with radiologist diagnosis. The ETBA model has the potential to provide sensitivity and specificity in CT screening of LC. This approach is rapid, non-invasive, multi-dimensional, and accurate for LC and BL diagnosis.

Keywords