Tehran University Medical Journal (Dec 2021)
Automated detection of coronavirus disease (COVID-19) by using data-mining techniques: a brief report
Abstract
Background: The clinical field has vast sick data that has not been analyzed. Discovering a way to analyze this raw data and turn it into an information treasure can save many lives. Using data mining methods is an efficient way to analyze this large amount of raw data. It can predict the future with accurate knowledge of the past, providing new insights into disease diagnosis and prevention. Several data mining methods exist but finding a suitable one is very important. Today, coronavirus disease (COVID-19) has become one of the causing deadly diseases in the world. The early diagnosis of pandemic coronavirus disease has a significant impact in preventing death. This study aims to extract the key indications of the disease and find the best data mining methods that enhance the accuracy of coronavirus disease diagnosis. Methods: In this study, to obtain high accuracy in diagnosing COVID-19 disease, a complete and effective workflow over data mining methods was proposed, which includes these steps: data pre-analyzing, indication selection, model creation, the measure of performance, and display of results. Data and related indications of patients with COVID-19 were collected from Kerman Afzalipour Hospital and Rafsanjan, Ali Ebn Abi Taleb Hospital. Prediction structures were made and tested via different combinations of the disease indications and seven data mining methods. To discover the best key indications, three criteria including accuracy, validation and F-value were applied and to discover the best data mining methods, accuracy and validation criteria were considered. For each data mining method, the criteria were measured independently and all results were reported for analysis. Finally, the best key indications and data mining methods that can diagnose COVID-19 disease with high accuracy were extracted. Results: 9 key indications and 3 data mining methods were obtained. Experimental results show that the discovered key indications and the best-operating data mining method (i.e. SVM) attain an accuracy of 83.19% for the diagnosis of coronavirus disease. Conclusion: Due to key indications and data mining methods obtained from this study, it is possible to use this method to diagnose coronavirus disease in different people of different clinical indications with high accuracy.