IEEE Access (Jan 2023)

Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System

  • Ganeshkumar Pugalendhi,
  • M. Mazhar Rathore,
  • Dhirendra Shukla,
  • Anand Paul

DOI
https://doi.org/10.1109/ACCESS.2023.3257875
Journal volume & issue
Vol. 11
pp. 35182 – 35196

Abstract

Read online

The genes data produced by microarray experiments is complex in terms of dimensions and samples. It consumes a lot of computation power and time when it is processed for a disease analysis while working with an expert system. At the same time, data can help doctors identify a patient’s health condition if it is presented in a meaningful way and processed on time. Several methods have been proposed to reduce the dimensions of medical microarray data and optimize its search space with minimal accuracy loss. However, the discretization of continuous gene-values in the process of dimension reduction is failed to preserve the inherent meaning of genes. Also, ensuring high accuracy and interpretability in the reduction process may result in extra processing time, which is unfavorable for time-critical applications. To overcome these issues, in this paper, we propose a dimension reduction method in conjunction with a fuzzy expert system (FES) optimization approach, while keeping an accuracy-interpretability-speedy tradeoff in mind. To accomplish this, we use a fuzzy rough set on ${f}$ -information to identify meaningful genes without changing their original values. We propose a conditionally guided particle swarm optimization for faster knowledge acquisition, where the velocity is adjusted based on a predefined update probability, resulting in a faster search. A big data processing architecture is designed using the Hadoop ecosystem along with a $MapReduce$ -equivalent algorithm of the proposed method for speedy processing, enabling parallel processing on microarray data to reduce dimensions and perform classification through knowledge extraction. The proposed method is thoroughly tested on eleven microarray datasets by considering accuracy-interpretability-speed tradeoff. The results show that the proposed method is effective in identifying disease-causing genes while also understanding the patient’s genetic profile with only a few operations and a small amount of CPU time. Statistical tests are also run to validate the proposed method’s efficacy in comparison to other methods.

Keywords