Scientific Data (Jul 2024)

A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation

  • Muwei Jian,
  • Hongyu Chen,
  • Zaiyong Zhang,
  • Nan Yang,
  • Haorang Zhang,
  • Lifu Ma,
  • Wenjing Xu,
  • Huixiang Zhi

DOI
https://doi.org/10.1038/s41597-024-03658-6
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accurately predicting multiple cancer types. This limitation can be attributed to the scarcity of publicly available datasets annotated with expert-level cancer type information. This research aims to bridge this gap by providing publicly accessible datasets and reliable tools for medical diagnosis, facilitating a finer categorization of different types of lung diseases so as to offer precise treatment recommendations. To achieve this objective, we curated a diverse dataset of lung Computed Tomography (CT) images, comprising 330 annotated nodules (nodules are labeled as bounding boxes) from 95 distinct patients. The quality of the dataset was evaluated using a variety of classical classification and detection models, and these promising results demonstrate that the dataset has a feasible application and further facilitate intelligent auxiliary diagnosis.