npj Primary Care Respiratory Medicine (Oct 2024)

The unreliability of crackles: insights from a breath sound study using physicians and artificial intelligence

  • Chun-Hsiang Huang,
  • Chi-Hsin Chen,
  • Jing-Tong Tzeng,
  • An-Yan Chang,
  • Cheng-Yi Fan,
  • Chih-Wei Sung,
  • Chi-Chun Lee,
  • Edward Pei-Chuan Huang

DOI
https://doi.org/10.1038/s41533-024-00392-9
Journal volume & issue
Vol. 34, no. 1
pp. 1 – 7

Abstract

Read online

Abstract Background and introduction In comparison to other physical assessment methods, the inconsistency in respiratory evaluations continues to pose a major issue and challenge. Objectives This study aims to evaluate the difference in the identification ability of different breath sound. Methods/description In this prospective study, breath sounds from the Formosa Archive of Breath Sound were labeled by five physicians. Six artificial intelligence (AI) breath sound interpretation models were developed based on all labeled data and the labels from the five physicians, respectively. After labeling by AIs and physicians, labels with discrepancy were considered doubtful and relabeled by two additional physicians. The final labels were determined by a majority vote among the physicians. The capability of breath sound identification for humans and AI was evaluated using sensitivity, specificity and the area under the receiver-operating characteristic curve (AUROC). Results/outcome A total of 11,532 breath sound files were labeled, with 579 doubtful labels identified. After relabeling and exclusion, there were 305 labels with gold standard. For wheezing, both human physicians and the AI model demonstrated good sensitivities (89.5% vs. 86.0%) and good specificities (96.4% vs. 95.2%). For crackles, both human physicians and the AI model showed good sensitivities (93.9% vs. 80.3%) but poor specificities (56.6% vs. 65.9%). Lower AUROC values were noted in crackles identification for both physicians and the AI model compared to wheezing. Conclusion Even with the assistance of artificial intelligence tools, accurately identifying crackles compared to wheezing remains challenging. Consequently, crackles are unreliable for medical decision-making, and further examination is warranted.