CAISHI: A benchmark histopathological H&E image dataset for cervical adenocarcinoma in situ identification, retrieval and few-shot learning evaluation

Xinyi Yang; Chen Li; Ruilin He; Jinzhu Yang; Hongzan Sun; Tao Jiang; Marcin Grzegorzek; Xiaohan Li; Chang Liu

Data in Brief (Apr 2024)

CAISHI: A benchmark histopathological H&E image dataset for cervical adenocarcinoma in situ identification, retrieval and few-shot learning evaluation

Xinyi Yang,
Chen Li,
Ruilin He,
Jinzhu Yang,
Hongzan Sun,
Tao Jiang,
Marcin Grzegorzek,
Xiaohan Li,
Chang Liu

Affiliations

Xinyi Yang: Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, Liaoning 110167, China
Chen Li: Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, Liaoning 110167, China; Corresponding author at: Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China.
Ruilin He: Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, Liaoning 110167, China
Jinzhu Yang: Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, Liaoning 110167, China
Hongzan Sun: Shengjing Hospital of China Medical University, Shenyang, Liaoning 110001, China
Tao Jiang: School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan 610075, China; International Joint Institute of Robotics and Intelligent Systems, Chengdu University of Information Technology, Chengdu, Sichuan 610225, China
Marcin Grzegorzek: Institute for Medical Informatics, University of Luebeck Ratzeburger Allee, Luebeck 160 23538, Federal Repulic of Germany; Department of Knowledge Engineering, University of Economics in Katowice, Katowice 50 40-287, Poland
Xiaohan Li: Shengjing Hospital of China Medical University, Shenyang, Liaoning 110001, China
Chang Liu: Shengjing Hospital of China Medical University, Shenyang, Liaoning 110001, China

Journal volume & issue: Vol. 53
p. 110141

Abstract

Read online

A benchmark histopathological Hematoxylin and Eosin (H&E) image dataset for Cervical Adenocarcinoma in Situ (CAISHI), containing 2240 histopathological images of Cervical Adenocarcinoma in Situ (AIS), is established to fill the current data gap, of which 1010 are images of normal cervical glands and another 1230 are images of cervical AIS. The sampling method is endoscope biopsy. Pathological sections are obtained by H&E staining from Shengjing Hospital, China Medical University. These images have a magnification of 100 and are captured by the Axio Scope. A1 microscope. The size of the image is 3840 × 2160 pixels, and the format is “.png”. The collection of CAISHI is subject to an ethical review by China Medical University with approval number 2022PS841K.These images are analyzed at multiple levels, including classification tasks and image retrieval tasks. A variety of computer vision and machine learning methods are used to evaluate the performance of the data. For classification tasks, a variety of classical machine learning classifiers such as k-means, support vector machines (SVM), and random forests (RF), as well as convolutional neural network classifiers such as Residual Network 50 (ResNet50), Vision Transformer (ViT), Inception version 3 (Inception-V3), and Visual Geometry Group Network 16 (VGG-16), are used. In addition, the Siamese network is used to evaluate few-shot learning tasks. In terms of image retrieval functions, color features, texture features, and deep learning features are extracted, and their performances are tested. CAISHI can help with the early diagnosis and screening of cervical cancer. Researchers can use this dataset to develop new computer-aided diagnostic tools that could improve the accuracy and efficiency of cervical cancer screening and advance the development of automated diagnostic algorithms.

Published in Data in Brief

ISSN: 2352-3409 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Science (General)
Website: http://www.journals.elsevier.com/data-in-brief/

About the journal

Abstract

Keywords