IEEE Access (Jan 2022)
Feature Selection by mRMR Method for Heart Disease Diagnosis
Abstract
Heart disease has become a non-ignorable threat to human health in recent years. Once without timely diagnosis and treatment, patients often suffer disability or even death. However, the diagnosis accuracy directly relies on different doctors’ experiences and various factors associated with heart disease bring heavy tasks on them make the situation worse. Therefore, to improve heart disease treatment, introducing computer-aided techniques to assist doctors in diagnosis is a feasible approach. At present, researchers usually use the processed dataset (13 features) selected by doctors from the unprocessed dataset (74 features) (UCI Machine Learning Repository) and apply the feature selection method to the dataset, it’s inappropriate because the feature scale is so small. People neglect the unprocessed dataset’s value and don’t realize it could contain some latent information. A comprehensive comparison is needed to demonstrate the unprocessed dataset’s advantages. Besides, the incremental feature combination method should be verified. As the minimum Redundancy - Maximum Relevance (mRMR) gains great success in feature selection, applying it as a feature filter can enhance classification accuracy. Thus, in this research, we introduced the mRMR method as a filter for feature selection and made a comprehensive comparison within several methods like Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Kendall, Random Forest, and other research works in several metrics. By analyzing the results, in most cases, the unprocessed dataset can enhance algorithm’s performance. The incremental feature selection method is effective and the mRMR is superior to other methods. Not only does it own the highest accuracies, but also the least supportive features. It has 100% accuracy with 8 features on the Cleveland dataset, 98.3% accuracy with 14 features on Hungarian, and 99% accuracy with 9 features on Long-beach-VA, respectively. Furthermore, we find that some features, which doctors regard as useless, play a part in classification, that should attract some attention from doctors.
Keywords