Data in Brief (Apr 2024)

CAISHI: A benchmark histopathological H&E image dataset for cervical adenocarcinoma in situ identification, retrieval and few-shot learning evaluation

  • Xinyi Yang,
  • Chen Li,
  • Ruilin He,
  • Jinzhu Yang,
  • Hongzan Sun,
  • Tao Jiang,
  • Marcin Grzegorzek,
  • Xiaohan Li,
  • Chang Liu

Journal volume & issue
Vol. 53
p. 110141

Abstract

Read online

A benchmark histopathological Hematoxylin and Eosin (H&E) image dataset for Cervical Adenocarcinoma in Situ (CAISHI), containing 2240 histopathological images of Cervical Adenocarcinoma in Situ (AIS), is established to fill the current data gap, of which 1010 are images of normal cervical glands and another 1230 are images of cervical AIS. The sampling method is endoscope biopsy. Pathological sections are obtained by H&E staining from Shengjing Hospital, China Medical University. These images have a magnification of 100 and are captured by the Axio Scope. A1 microscope. The size of the image is 3840 × 2160 pixels, and the format is “.png”. The collection of CAISHI is subject to an ethical review by China Medical University with approval number 2022PS841K.These images are analyzed at multiple levels, including classification tasks and image retrieval tasks. A variety of computer vision and machine learning methods are used to evaluate the performance of the data. For classification tasks, a variety of classical machine learning classifiers such as k-means, support vector machines (SVM), and random forests (RF), as well as convolutional neural network classifiers such as Residual Network 50 (ResNet50), Vision Transformer (ViT), Inception version 3 (Inception-V3), and Visual Geometry Group Network 16 (VGG-16), are used. In addition, the Siamese network is used to evaluate few-shot learning tasks. In terms of image retrieval functions, color features, texture features, and deep learning features are extracted, and their performances are tested. CAISHI can help with the early diagnosis and screening of cervical cancer. Researchers can use this dataset to develop new computer-aided diagnostic tools that could improve the accuracy and efficiency of cervical cancer screening and advance the development of automated diagnostic algorithms.

Keywords