IEEE Access (Jan 2021)

Predicting the Risk of Alcohol Use Disorder Using Machine Learning: A Systematic Literature Review

  • Ali Ebrahimi,
  • Uffe Kock Wiil,
  • Thomas Schmidt,
  • Amin Naemi,
  • Anette Sogaard Nielsen,
  • Ghulam Mujtaba Shaikh,
  • Marjan Mansourvar

DOI
https://doi.org/10.1109/ACCESS.2021.3126777
Journal volume & issue
Vol. 9
pp. 151697 – 151712

Abstract

Read online

The number of deaths caused by alcohol-related diseases may be reduced by predicting alcohol use disorder (AUD). Many researchers have worked on AUD prediction using machine learning (ML) techniques. However, to the best of our knowledge, there is a lack of a comprehensive systematic literature review (SLR) that summarizes the existing studies on AUD prediction using ML in the last ten years. To address this knowledge gap, this article provides an SLR of academic articles on AUD prediction using ML techniques dated from January 2010 to July 2021. This SLR highlights technical decision analysis related to five aspects: data collection site, characteristics, and type of dataset; data sampling and data pre-processing techniques; feature types and feature engineering techniques; and characteristics of ML techniques and evaluation metrics. Six bibliographic databases were searched, and the identified studies were rigorously reviewed based on the above five aspects. In the selected studies, public datasets were not used very often for AUD prediction. Given that, the current paper identified two different types of data collection sites for review. Imbalanced class distribution in datasets was the primary focus of the pre-processing and sampling steps. Various features, including demographics, family history, drinking behaviour, and electronic health records, were introduced as the more widely used AUD prediction features. The filter, wrapper, and embedded methods were identified as the primary feature selection methods. Support vector machine was the most widely employed algorithm for predicting AUD; however, the lack of deep neural network techniques is evident in this field. Moreover, considering gender disparities, early detection of AUD, and identifying trajectories towards AUD are suggested for future work. For the purpose of evaluating the performance of the prediction approaches, most studies considered the overall accuracy and the area under the receiver operating characteristic curve. Nevertheless, external validation was not performed in any of the selected studies. This paper also discusses challenges and open issues of AUD prediction for future research. This SLR represents a valuable resource for scholars investigating the prediction of AUD.

Keywords