Informatics in Medicine Unlocked (Jan 2021)
Conditional sliding windows: An approach for handling data limitation in colorectal histopathology image classification
Abstract
Large amounts of data are required for the training process with a convolutional neural network (CNN) because small datasets with low variation will cause over-fitting, and the model cannot predict new data with high accuracy. Additionally, the non-availability of histopathological medical data presents an issue because without ethical permission, such data cannot be obtained easily. Therefore, this study proposes a conditional sliding window algorithm to obtain sub-sample data on images of histopathology.Two sets of original data were used, one from the Warwick dataset with dimensions of 775 × 522 pixels and the other from the Department of Pathology and Anatomy, Faculty of Medicine Universitas Indonesia. The algorithm used was inspired by the conventional sliding window method, but implemented with added conditions, such as sliding the window algorithm from the left on (x,y) pixel coordinates, thereby moving from left to right, then up to down until the entire image was covered. Consequently, the new image was produced with two dimensions: 200 × 200 and 300 × 300 pixels. However, to avoid loss of information, the 25 and 50 pixels overlap were used. In this study, CNN 7-5-7 was designed and proposed to perform the process.The conditional sliding window algorithm can produce various sub-samples depending on the image and window size. Furthermore, the images produced were used to develop a CNN and were proven to accurately predict benign and malignant tissues compared to the model from the original dataset. Moreover, the sensitivity values of the Warwick public dataset and the one generated in this study are above 0.80, which shows that the proposed CNN architecture is more stable compared to the existing methods such as AlexNet and DenseNet121.This study succeeded in solving the limitations of colorectal histopathological training data by developing a conditional sliding window algorithm. This algorithm can be applied to generate other histopathological data. Moreover, our proposed CNN 7-5-7 is the fastest architecture for training, comparable to state-of-the-art methodologies. Furthermore, the dataset was used to develop the model for colorectal cancer identification and integrated on the web-based application for further implementation.