Human-Centric Intelligent Systems (Jun 2023)

An Explainable Artificial Intelligence Framework for the Predictive Analysis of Hypo and Hyper Thyroidism Using Machine Learning Algorithms

  • Md. Bipul Hossain,
  • Anika Shama,
  • Apurba Adhikary,
  • Avi Deb Raha,
  • K. M. Aslam Uddin,
  • Mohammad Amzad Hossain,
  • Imtia Islam,
  • Saydul Akbar Murad,
  • Md. Shirajum Munir,
  • Anupam Kumar Bairagi

DOI
https://doi.org/10.1007/s44230-023-00027-1
Journal volume & issue
Vol. 3, no. 3
pp. 211 – 231

Abstract

Read online

Abstract The thyroid gland is the crucial organ in the human body, secreting two hormones that help to regulate the human body’s metabolism. Thyroid disease is a severe medical complaint that could be developed by high Thyroid Stimulating Hormone (TSH) levels or an infection in the thyroid tissues. Hypothyroidism and hyperthyroidism are two critical conditions caused by insufficient thyroid hormone production and excessive thyroid hormone production, respectively. Machine learning models can be used to precisely process the data generated from different medical sectors and to build a model to predict several diseases. In this paper, we use different machine-learning algorithms to predict hypothyroidism and hyperthyroidism. Moreover, we identified the most significant features, which can be used to detect thyroid diseases more precisely. After completing the pre-processing and feature selection steps, we applied our modified and original data to several classification models to predict thyroidism. We found Random Forest (RF) is giving the maximum evaluation score in all sectors in our dataset, and Naive Bayes is performing very poorly. Moreover selecting the feature by using the feature importance method RF provides the best accuracy of 91.42%, precision of 92%, recall of 92% and F1-score of 92%. Further, by analyzing the characteristics and behavior of the dataset, we identified the most important features (TSH, T3, TT4, and FTI) of the dataset. In terms of accuracy and other performance evaluation criteria, this study could advocate the use of effective classifiers and features backed by machine learning algorithms to detect and diagnose thyroid disease. Finally, we did some explainability analysis of our best classifier to understand the internal black-box of our machine learning model and datasets. This study could further pave the way for the researcher as well as healthcare professionals to analyze thyroid disease in real time applications.

Keywords