Data Balancing Based on Pre-Training Strategy for Liver Segmentation from CT Scans

Yong Zhang; Yi Wang; Yizhu Wang; Bin Fang; Wei Yu; Hongyu Long; Hancheng Lei

doi:10.3390/app9091825

Applied Sciences (May 2019)

Data Balancing Based on Pre-Training Strategy for Liver Segmentation from CT Scans

Yong Zhang,
Yi Wang,
Yizhu Wang,
Bin Fang,
Wei Yu,
Hongyu Long,
Hancheng Lei

Affiliations

Yong Zhang: College of Computer Science, Chongqing University, No.174 Shazhengjie, Shapingba, Chongqing 400044, China
Yi Wang: College of Computer Science, Chongqing University, No.174 Shazhengjie, Shapingba, Chongqing 400044, China
Yizhu Wang: Ziwei king star Digital Technology Co., Ltd., Nine Floors of G4 A Block, Phase 2 Innovation Industrial Park, Hefei High-tech Zone, Hefei 230000, China
Bin Fang: College of Computer Science, Chongqing University, No.174 Shazhengjie, Shapingba, Chongqing 400044, China
Wei Yu: College of Computer Science, Chongqing University, No.174 Shazhengjie, Shapingba, Chongqing 400044, China
Hongyu Long: College of Computer Science, Chongqing University, No.174 Shazhengjie, Shapingba, Chongqing 400044, China
Hancheng Lei: College of Computer Science, Chongqing University, No.174 Shazhengjie, Shapingba, Chongqing 400044, China

DOI: https://doi.org/10.3390/app9091825
Journal volume & issue: Vol. 9, no. 9
p. 1825

Abstract

Read online

Data imbalance is often encountered in deep learning process and is harmful to model training. The imbalance of hard and easy samples in training datasets often occurs in the segmentation tasks from Contrast Tomography (CT) scans. However, due to the strong similarity between adjacent slices in volumes and different segmentation tasks (the same slice may be classified as a hard sample in liver segmentation task, but an easy sample in the kidney or spleen segmentation task), it is hard to solve this imbalance of training dataset using traditional methods. In this work, we use a pre-training strategy to distinguish hard and easy samples, and then increase the proportion of hard slices in training dataset, which could mitigate imbalance of hard samples and easy samples in training dataset, and enhance the contribution of hard samples in training process. Our experiments on liver, kidney and spleen segmentation show that increasing the ratio of hard samples in the training dataset could enhance the prediction ability of model by improving its ability to deal with hard samples. The main contribution of this work is the application of pre-training strategy, which enables us to select training samples online according to different tasks and to ease data imbalance in the training dataset.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords