Systematic Evaluation of Image Tiling Adverse Effects on Deep Learning Semantic Segmentation

G. Anthony Reina; Ravi Panchumarthy; Siddhesh Pravin Thakur; Alexei Bastidas; Spyridon Bakas; Spyridon Bakas; Spyridon Bakas

doi:10.3389/fnins.2020.00065

Frontiers in Neuroscience (Feb 2020)

Systematic Evaluation of Image Tiling Adverse Effects on Deep Learning Semantic Segmentation

G. Anthony Reina,
Ravi Panchumarthy,
Siddhesh Pravin Thakur,
Alexei Bastidas,
Spyridon Bakas,
Spyridon Bakas,
Spyridon Bakas

Affiliations

G. Anthony Reina: Intel Corporation, Santa Clara, CA, United States
Ravi Panchumarthy: Intel Corporation, Santa Clara, CA, United States
Siddhesh Pravin Thakur: Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA, United States
Alexei Bastidas: Intel Corporation, Santa Clara, CA, United States
Spyridon Bakas: Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA, United States
Spyridon Bakas: Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
Spyridon Bakas: Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States

DOI: https://doi.org/10.3389/fnins.2020.00065
Journal volume & issue: Vol. 14

Abstract

Read online

Convolutional neural network (CNN) models obtain state of the art performance on image classification, localization, and segmentation tasks. Limitations in computer hardware, most notably memory size in deep learning accelerator cards, prevent relatively large images, such as those from medical and satellite imaging, from being processed as a whole in their original resolution. A fully convolutional topology, such as U-Net, is typically trained on down-sampled images and inferred on images of their original size and resolution, by simply dividing the larger image into smaller (typically overlapping) tiles, making predictions on these tiles, and stitching them back together as the prediction for the whole image. In this study, we show that this tiling technique combined with translationally-invariant nature of CNNs causes small, but relevant differences during inference that can be detrimental in the performance of the model. Here we quantify these variations in both medical (i.e., BraTS) and non-medical (i.e., satellite) images and show that training a 2D U-Net model on the whole image substantially improves the overall model performance. Finally, we compare 2D and 3D semantic segmentation models to show that providing CNN models with a wider context of the image in all three dimensions leads to more accurate and consistent predictions. Our results suggest that tiling the input to CNN models—while perhaps necessary to overcome the memory limitations in computer hardware—may lead to undesirable and unpredictable errors in the model's output that can only be adequately mitigated by increasing the input of the model to the largest possible tile size.

Published in Frontiers in Neuroscience

ISSN: 1662-4548 (Print); 1662-453X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: http://www.frontiersin.org/neuroscience

About the journal

Abstract

Keywords