Informatics in Medicine Unlocked (Jan 2022)
Sequential machine learning in prediction of common cancers
Abstract
Cancer is one of the most common causes of death in the world. It is characterized by the multi-stage transformation of normal cells into tumor cells. Early cancer detection can significantly reduce its consequences, which was the objective of many machine learning (ML) published studies. However, most of them focused on microarray, gene expression, or publicly available medical datasets. Almost none offered an approach for predicting cancer through analysis of sequential data, such as Electronic Health Record (EHR) data.This paper presents a sequential ML approach to predict the occurrence of lung cancer, breast cancer, cervical cancer, and liver cell cancer using EHR data. The accuracy of sequence learning models based on long short-term memory (LSTM) and bidirectional gated recurrent units (GRU) were compared to traditional ML methods based on multilayer perceptron, random forest, decision tree, and K-nearest neighbor. The models were trained and tested on 50,606 patient hospitalization histories. Unsupervised and supervised data reduction methods (singular value decomposition (SVD) and a neural network embedding layer) were applied to overcome the challenges of high-dimensionality and sparsity of EHR data.The results provided evidence that for this application GRU outperforms alternatives based on accuracy, Area Under the Receiver Operating Characteristic curve (AUROC), sensitivity (recall), specificity, precision, and F1 score. It was the best performing model with accuracy between 81% (breast cancer) and 88% (liver cancer) on balanced out of sample EHRs. Multilayer perceptron and LSTM manifested comparable performances (accuracies between 78% and 87%) among the alternatives, while decision tree was the worst-performing model.The findings of this study could potentially aid medical professionals in cancer diagnostics, treatment, and prevention. In particular, experiments confirmed that GRU could accurately predict cancer by learning from simplified patient representations using an embedding layer or SVD. Therefore, GRU's predictions could be used in early cancer detection, potentially improving patients' survival rates.