BMC Infectious Diseases (Jan 2024)

Identification of associated risk factors for serological distribution of hepatitis B virus via machine learning models

  • Ning Yao,
  • Yang Liu,
  • Jiawei Xu,
  • Qing Wang,
  • Quanhua Zhou,
  • Yue Wang,
  • Dong Yi,
  • Yazhou Wu

DOI
https://doi.org/10.1186/s12879-023-08911-8
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Background The provincial-level sero-survey was launched to learn the updated seroprevalence of hepatitis B virus (HBV) infection in the general population aged 1–69 years in Chongqing and to assess the risk factors for HBV infection to effectively screen persons with chronic hepatitis B (CHB). Methods A total of 1828 individuals aged 1–69 years were investigated, and hepatitis B surface antigen (HBsAg), antibody to HBsAg (HBsAb), and antibody to B core antigen (HBcAb) were detected. Logistic regression and three machine learning (ML) algorithms, including random forest (RF), support vector machine (SVM), and stochastic gradient boosting (SGB), were developed for analysis. Results The HBsAg prevalence of the total population was 3.83%, and among persons aged 1–14 years and 15–69 years, it was 0.24% and 4.89%, respectively. A large figure of 95.18% (770/809) of adults was unaware of their occult HBV infection. Age, region, and immunization history were found to be statistically associated with HBcAb prevalence with a logistic regression model. The prediction accuracies were 0.717, 0.727, and 0.725 for the proposed RF, SVM, and SGB models, respectively. Conclusions The logistic regression integrated with ML models could helpfully screen the risk factors for HBV infection and identify high-risk populations with CHB.

Keywords