IEEE Access (Jan 2021)

A Novel Approach of Transcriptomic microRNA Analysis Using Text Mining Methods: An Early Detection of Multiple Sclerosis Disease

  • Nehal M. Ali,
  • Mohamed Shaheen,
  • Mai S. Mabrouk,
  • Mohamed A. Aborizka

DOI
https://doi.org/10.1109/ACCESS.2021.3109069
Journal volume & issue
Vol. 9
pp. 120024 – 120033

Abstract

Read online

Multiple sclerosis is an autoimmune disease that causes psychological impacts and severe physical disabilities, including motor disabilities and partial blindness. This work introduces an early detection method for multiple sclerosis disease by analyzing transcriptomic microRNA data. By transforming this phenotype classification problem into a text mining problem, multiple sclerosis disease biomarkers can be obtained. To our knowledge, text mining methods have not been introduced previously in transcriptomic data analysis of multiple sclerosis disease. Hence, this work presents a complete predictive model by combining consecutive transcriptomic data preprocessing procedures, followed by the proposed KmerFIDF method as a feature extraction method and linear discriminant analysis for dimensionality reduction. Predictive machine learning methods can then be obtained accordingly. This study describes experimental work on a transcriptomic dataset of noncoding microRNA sequences denoted from relapsing-remitting multiple sclerosis patients before fingolimod treatment and after six consecutive months of treatment. The experimental results of the predictive methods with the proposed model report sensitivity, specificity, F1-score, and average accuracy scores of 96.4, 96.47, 95.6, and 97% with random forest, 92.89, 92.78, 93.2, and 94% with support vector machine and 91.95, 92.2, 93.1, and 94% with logistic regression, respectively. These promising results support the introduced model and the proposed KmerFIDF method in transcriptomic data analysis. Moreover, comparative experiments are conducted with two referenced studies. The obtained results show that the average reported accuracy scores of the proposed model outperform the referenced literature work.

Keywords