Scientific Data (Jul 2024)

Annotated Pap cell images and smear slices for cell classification

  • David Kupas,
  • Andras Hajdu,
  • Ilona Kovacs,
  • Zoltan Hargitai,
  • Zita Szombathy,
  • Balazs Harangi

DOI
https://doi.org/10.1038/s41597-024-03596-3
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 8

Abstract

Read online

Abstract Machine learning-based systems have become instrumental in augmenting global efforts to combat cervical cancer. A burgeoning area of research focuses on leveraging artificial intelligence to enhance the cervical screening process, primarily through the exhaustive examination of Pap smears, traditionally reliant on the meticulous and labor-intensive analysis conducted by specialized experts. Despite the existence of some comprehensive and readily accessible datasets, the field is presently constrained by the limited volume of publicly available images and smears. As a remedy, our work unveils APACC (Annotated PAp cell images and smear slices for Cell Classification), a comprehensive dataset designed to bridge this gap. The APACC dataset features a remarkable array of images crucial for advancing research in this field. It comprises 103,675 annotated cell images, carefully extracted from 107 whole smears, which are further divided into 21,371 sub-regions for a more refined analysis. This dataset includes a vast number of cell images from conventional Pap smears and their specific locations on each smear, offering a valuable resource for in-depth investigation and study.