Preparing CT imaging datasets for deep learning in lung nodule analysis: Insights from four well-known datasets

Jingxuan Wang; Nikos Sourlos; Sunyi Zheng; Nils van der Velden; Gert Jan Pelgrim; Rozemarijn Vliegenthart; Peter van Ooijen

Heliyon (Jun 2023)

Preparing CT imaging datasets for deep learning in lung nodule analysis: Insights from four well-known datasets

Jingxuan Wang,
Nikos Sourlos,
Sunyi Zheng,
Nils van der Velden,
Gert Jan Pelgrim,
Rozemarijn Vliegenthart,
Peter van Ooijen

Affiliations

Jingxuan Wang: Department of Radiology, University of Groningen, University Medical Center of Groningen, 9713GZ, Groningen, the Netherlands; Corresponding author. Department of Radiology, University of Groningen, University Medical Center of Groningen, 9713GZ, Groningen, the Netherlands.
Nikos Sourlos: Department of Radiology, University of Groningen, University Medical Center of Groningen, 9713GZ, Groningen, the Netherlands
Sunyi Zheng: School of Engineering, Westlake University, Xihu District, 310030, Hangzhou, China
Nils van der Velden: Department of Radiology, University of Groningen, University Medical Center of Groningen, 9713GZ, Groningen, the Netherlands
Gert Jan Pelgrim: Department of Radiology, University of Groningen, University Medical Center of Groningen, 9713GZ, Groningen, the Netherlands
Rozemarijn Vliegenthart: Department of Radiology, University of Groningen, University Medical Center of Groningen, 9713GZ, Groningen, the Netherlands; Data Science Center in Health (DASH), University of Groningen, University Medical Center of Groningen, 9713GZ, Groningen, the Netherlands
Peter van Ooijen: Department of Radiation Oncology, University of Groningen, University Medical Center of Groningen, 9713GZ, Groningen, the Netherlands; Data Science Center in Health (DASH), University of Groningen, University Medical Center of Groningen, 9713GZ, Groningen, the Netherlands; Corresponding author. Department of Radiation Oncology, University of Groningen, University Medical Center of Groningen, 9713GZ, Groningen, the Netherlands.

Journal volume & issue: Vol. 9, no. 6
p. e17104

Abstract

Read online

Background: Deep learning is an important means to realize the automatic detection, segmentation, and classification of pulmonary nodules in computed tomography (CT) images. An entire CT scan cannot directly be used by deep learning models due to image size, image format, image dimensionality, and other factors. Between the acquisition of the CT scan and feeding the data into the deep learning model, there are several steps including data use permission, data access and download, data annotation, and data preprocessing. This paper aims to recommend a complete and detailed guide for researchers who want to engage in interdisciplinary lung nodule research of CT images and Artificial Intelligence (AI) engineering.Methods: The data preparation pipeline used the following four popular large-scale datasets: LIDC-IDRI (Lung Image Database Consortium image collection), LUNA16 (Lung Nodule Analysis 2016), NLST (National Lung Screening Trial) and NELSON (The Dutch-Belgian Randomized Lung Cancer Screening Trial). The dataset preparation is presented in chronological order.Findings: The different data preparation steps before deep learning were identified. These include both more generic steps and steps dedicated to lung nodule research. For each of these steps, the required process, necessity, and example code or tools for actual implementation are provided.Discussion and conclusion: Depending on the specific research question, researchers should be aware of the various preparation steps required and carefully select datasets, data annotation methods, and image preprocessing methods. Moreover, it is vital to acknowledge that each auxiliary tool or code has its specific scope of use and limitations. This paper proposes a standardized data preparation process while clearly demonstrating the principles and sequence of different steps. A data preparation pipeline can be quickly realized by following these proposed steps and implementing the suggested example codes and tools.

Published in Heliyon

ISSN: 2405-8440 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General); Social Sciences: Social sciences (General)
Website: https://www.cell.com/heliyon/home

About the journal

Abstract

Keywords