Scientific Data (Oct 2024)
LungHist700: A dataset of histological images for deep learning in pulmonary pathology
Abstract
Abstract Accurate detection and classification of lung malignancies are crucial for early diagnosis, treatment planning, and patient prognosis. Conventional histopathological analysis is time-consuming, limiting its clinical applicability. To address this, we present a dataset of 691 high-resolution (1200 × 1600 pixels) histopathological lung images, covering adenocarcinomas, squamous cell carcinomas, and normal tissues from 45 patients. These images are subdivided into three differentiation levels for both pathological types: well, moderately, and poorly differentiated, resulting in seven classes for classification. The dataset includes images at 20x and 40x magnification, reflecting real clinical diversity. We evaluated image classification using deep neural network and multiple instance learning approaches. Each method was used to classify images at 20x and 40x magnification into three superclasses. We achieved accuracies between 81% and 92%, depending on the method and resolution, demonstrating the dataset’s utility.