Predicting cervical cancer risk probabilities using advanced H20 AutoML and local interpretable model-agnostic explanation techniques

Sashikanta Prusty; Srikanta Patnaik; Sujit Kumar Dash; Sushree Gayatri Priyadarsini Prusty; Jyotirmayee Rautaray; Ghanashyam Sahoo

doi:10.7717/peerj-cs.1916

PeerJ Computer Science (May 2024)

Predicting cervical cancer risk probabilities using advanced H20 AutoML and local interpretable model-agnostic explanation techniques

Sashikanta Prusty,
Srikanta Patnaik,
Sujit Kumar Dash,
Sushree Gayatri Priyadarsini Prusty,
Jyotirmayee Rautaray,
Ghanashyam Sahoo

Affiliations

Sashikanta Prusty: Department of Computer Science and Engineering, Siksha O Anusandhan University Institute of Technical Education and Research, Bhubaneswar, Odisha, India
Srikanta Patnaik: Director of IIMT, Interscience Institute of Management and Technology, Bhubaneswar, Odisha, India
Sujit Kumar Dash: P & IT, Biju Pattanaik University of Technology, Rourkela, Odisha, India
Sushree Gayatri Priyadarsini Prusty: Department of Computer Science and Engineering, Siksha O Anusandhan University Institute of Technical Education and Research, Bhubaneswar, Odisha, India
Jyotirmayee Rautaray: Department of Computer Science, Odisha University of Technology and Research, Bhubaneswar, Odisha, India
Ghanashyam Sahoo: Department of Computer Science and Engineering, GITA Autonomous College, Bhubaneswaer, Odisha, India

DOI: https://doi.org/10.7717/peerj-cs.1916
Journal volume & issue: Vol. 10
p. e1916

Abstract

Read online Read online

Background Cancer is positioned as a major disease, particularly for middle-aged people, which remains a global concern that can develop in the form of abnormal growth of body cells at any place in the human body. Cervical cancer, often known as cervix cancer, is cancer present in the female cervix. In the area where the endocervix (upper two-thirds of the cervix) and ectocervix (lower third of the cervix) meet, the majority of cervical cancers begin. Despite an influx of people entering the healthcare industry, the demand for machine learning (ML) specialists has recently outpaced the supply. To close the gap, user-friendly applications, such as H2O, have made significant progress these days. However, traditional ML techniques handle each stage of the process separately; whereas H2O AutoML can automate a major portion of the ML workflow, such as automatic training and tuning of multiple models within a user-defined timeframe. Methods Thus, novel H2O AutoML with local interpretable model-agnostic explanations (LIME) techniques have been proposed in this research work that enhance the predictability of an ML model in a user-defined timeframe. We herein collected the cervical cancer dataset from the freely available Kaggle repository for our research work. The Stacked Ensembles approach, on the other hand, will automatically train H2O models to create a highly predictive ensemble model that will outperform the AutoML Leaderboard in most instances. The novelty of this research is aimed at training the best model using the AutoML technique that helps in reducing the human effort over traditional ML techniques in less amount of time. Additionally, LIME has been implemented over the H2O AutoML model, to uncover black boxes and to explain every individual prediction in our model. We have evaluated our model performance using the findprediction() function on three different idx values (i.e., 100, 120, and 150) to find the prediction probabilities of two classes for each feature. These experiments have been done in Lenovo core i7 NVidia GeForce 860M GPU laptop in Windows 10 operating system using Python 3.8.3 software on Jupyter 6.4.3 platform. Results The proposed model resulted in the prediction probabilities depending on the features as 87%, 95%, and 87% for class ‘0’ and 13%, 5%, and 13% for class ‘1’ when idx_value=100, 120, and 150 for the first case; 100% for class ‘0’ and 0% for class ‘1’, when idx_value= 10, 12, and 15 respectively. Additionally, a comparative analysis has been drawn where our proposed model outperforms previous results found in cervical cancer research.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords