Cesarean Section Classification Using Machine Learning With Feature Selection, Data Balancing, and Explainability

Nahid Sultan; Mahmudul Hasan; Md. Ferdous Wahid; Hasi Saha; Ahsan Habib

doi:10.1109/ACCESS.2023.3303342

IEEE Access (Jan 2023)

Cesarean Section Classification Using Machine Learning With Feature Selection, Data Balancing, and Explainability

Nahid Sultan,
Mahmudul Hasan,
Md. Ferdous Wahid,
Hasi Saha,
Ahsan Habib

Affiliations

Nahid Sultan: ORCiD; Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
Mahmudul Hasan: Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
Md. Ferdous Wahid: ORCiD; Department of Electrical and Electronic Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
Hasi Saha: ORCiD; Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
Ahsan Habib: ORCiD; School of Information Technology, Deakin University, Geelong, VIC, Australia

DOI: https://doi.org/10.1109/ACCESS.2023.3303342
Journal volume & issue: Vol. 11
pp. 84487 – 84499

Abstract

Read online

Disease samples are naturally fewer than healthy samples which introduces bias in the training of machine learning (ML) models. Current study focuses in learning discriminating patterns between cesarean and non-cesarean phenomena based on a dataset consisting of 161 features of total 692 cesarean and 5465 non-cesarean samples which comes as four folds based on four different hospitals (hospital A, B, C and D). The dataset is noisy, contains missing values, features are at different scales and above all, 161 features are quite a large in number and risks containing unnecessary information with respect to learning to separate the C-section class from non-cesarean.This study introduced a data pre-processing pipeline, resolving issues with data imbalance, handling missing values, identifying and deleting outliers, etc. A novel ensemble model is proposed which is able to consistently perform better irrespective of data volumes (data fold A, A+B, A+B+C and A+B+C+D) and pre-processing pipeline and achieved 96-99% accuracy across data volumes. Finally, the proposed model’s decision-making was explained in terms of prominent features where higher values of features like Episiotomy, age of women and Fetal intrapartum pH accounts for causing C-section.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords