Systematic Review of Using Machine Learning in Imputing Missing Values

Mustafa Alabadla; Fatimah Sidi; Iskandar Ishak; Hamidah Ibrahim; Lilly Suriani Affendey; Zafienas Che Ani; Marzanah A. Jabar; Umar Ali Bukar; Navin Kumar Devaraj; Ahmad Sobri Muda; Anas Tharek; Noritah Omar; M. Izham Mohd Jaya

doi:10.1109/ACCESS.2022.3160841

IEEE Access (Jan 2022)

Systematic Review of Using Machine Learning in Imputing Missing Values

Mustafa Alabadla,
Fatimah Sidi,
Iskandar Ishak,
Hamidah Ibrahim,
Lilly Suriani Affendey,
Zafienas Che Ani,
Marzanah A. Jabar,
Umar Ali Bukar,
Navin Kumar Devaraj,
Ahmad Sobri Muda,
Anas Tharek,
Noritah Omar,
M. Izham Mohd Jaya

Affiliations

Mustafa Alabadla: ORCiD; Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM), Serdang, Selangor, Malaysia
Fatimah Sidi: ORCiD; Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM), Serdang, Selangor, Malaysia
Iskandar Ishak: ORCiD; Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM), Serdang, Selangor, Malaysia
Hamidah Ibrahim: ORCiD; Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM), Serdang, Selangor, Malaysia
Lilly Suriani Affendey: ORCiD; Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM), Serdang, Selangor, Malaysia
Zafienas Che Ani: ORCiD; Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM), Serdang, Selangor, Malaysia
Marzanah A. Jabar: ORCiD; Department of Software Engineering and Information System, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM), Serdang, Selangor, Malaysia
Umar Ali Bukar: ORCiD; Department of Software Engineering and Information System, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM), Serdang, Selangor, Malaysia
Navin Kumar Devaraj: ORCiD; Department of Family Medicine, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia (UPM), Serdang, Selangor, Malaysia
Ahmad Sobri Muda: ORCiD; Department of Radiology, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia (UPM), Serdang, Selangor, Malaysia
Anas Tharek: ORCiD; Department of Radiology, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia (UPM), Serdang, Selangor, Malaysia
Noritah Omar: ORCiD; Department of English, Faculty of Modern Languages and Communication, Universiti Putra Malaysia (UPM), Serdang, Selangor, Malaysia
M. Izham Mohd Jaya: ORCiD; Department of Software Engineering, Faculty of Computing, Universiti Malaysia Pahang (UMP), Pekan, Pahang, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2022.3160841
Journal volume & issue: Vol. 10
pp. 44483 – 44502

Abstract

Read online

Missing data are a universal data quality problem in many domains, leading to misleading analysis and inaccurate decisions. Much research has been done to investigate the different mechanisms of missing data and the proper techniques in handling various data types. In the last decade, machine learning has been utilized to replace conventional methods to address the problem of missing values more efficiently. By studying and analyzing recently proposed methods using machine learning approaches, vital adoptions in accuracy, performance, and time consumed can be highlighted. This study aimed to help data analysts and researchers address the limitations of machine learning imputation methods by conducting a systematic literature review to provide a comprehensive overview of using such methods to impute missing values. Novel proposed machine learning approaches used for data imputation are analyzed and summarized to assist researchers in selecting a proper machine learning method based on several factors and settings. The review was performed on research studies published between 2016 and 2021 on adopting machine learning to impute missing values, focusing on their strengths and limitations. A total of 684 research articles from various scientific databases were analyzed using search engines, and 94 of them were selected as primary studies. Finally, several recommendations were given to guide future researchers in applying machine learning to impute missing values.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords