Zhongguo quanke yixue (Apr 2024)

Design Features and Methodological Quality of Researches about Prediction Models Based on Machine Learning in Primary Care: a Scoping Review

  • ZHONG Jinjia, LI Wentao, HUANG Yafang, WU Hao

DOI
https://doi.org/10.12114/j.issn.1007-9572.2023.0561
Journal volume & issue
Vol. 27, no. 10
pp. 1271 – 1276

Abstract

Read online

Background Researches about prediction models based on machine learning in primary care developed rapidly in recent years, but there are few researches about the design features and methodological quality. Objective To systematacially summarize and analyze the design features and methodological quality of researches about prediction models based on machine learning in primary care. Methods Researches about prediction models based on machine learning in primary care was searched in PubMed, Embase, CNKI, Wanfang Data published from base-building to 2023-02-21, descriptive summary and description methods were used to analyze the basic characteristics of the included literature, types of prediction models, sample size, handling method of missing value, types of machine learning algorithms, model performance evaluation index and prediction efficiency, and model verification method. Results Totally 30 literature were enrolled, involving 106 prediction models, thereinto 17 literature were published between 2021 and 2023; research topics: respiratory disease in 6 literature, tumour in 4 literature, outpatient appointment in 3 literature; sample size over 1 000 in 26 literature (accounting for 86.67%, 95%CI=68.36%-95.64%) ; using machine learning methods to hand missing value in 7 literature; 65 prediction models used tree-based machine learning algorithm, in which random forest was the most frequently used (accounting for 32.08%, 95%CI=23.53%-41.95%) ; 61 prediction models used AUC of ROC or consistency (C statistic) as the differentiation evaluation index (accounting for 57.55%, 95%CI=47.57%-66.97%), but only 14 prediction models reported prediction models (accounting for 13.21%, 95%CI=7.67%-21.50%) ; the differentiation of most of the 106 prediction models was good, but bias risk assessment results of 92 prediction models were high-risk (accounting for 86.79%, 95%CI=78.50%-92.33%) ; only 7 literature involved prediction models conducted the external validation. Conclusion Researches about prediction models based on machine learning in primary care increase gradually in the past three years, in which the topics mainly involve respiratory disease, tumour, outpatient appointment and so on; there are significant difference in sample size and handling method of missing value in the 106 prediction models, most of the 106 prediction models are with good differentiation, but most of them did not conducted the external validation, and the overall risk of bias is relatively high.

Keywords