Journal of Translational Medicine (Nov 2024)
Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding
Abstract
Abstract Background The global outbreak of the coronavirus disease 2019 (COVID-19) has been enormously damaging, in which prolonged shedding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, previously 2019-nCoV) infection is a challenge in the prevention and treatment of COVID-19. However, there is still incomplete research on the risk factors that affect delayed shedding of SARS-CoV-2. Methods In a retrospective analysis of 56,878 hospitalized patients in the Fangcang Shelter Hospital (National Convention and Exhibition Center) in Shanghai, China, we compared patients with the duration of SARS-CoV-2 viral shedding > 12 days with those days < 12 days. The results of real-time polymerase chain reaction (RT-PCR) tests determined the duration of viral shedding from the first day of SARS-CoV-2 positivity to the day of SARS-CoV-2 negativity. The extreme gradient boosting (XGBoost) machine learning method was employed to establish a prediction model for prolonged SARS-CoV-2 shedding and analyze significant risk factors. Filtering features retraining and Shapley Additive Explanations (SHAP) techniques were followed to demonstrate and further explain the risk factors for long-term SARS-CoV-2 infection. Results We conducted an assessment of ten different features, including vaccination, hypertension, diabetes, admission cycle threshold (Ct) value, cardio-cerebrovascular disease, gender, age, occupation, symptom, and family accompaniment, to determine their impact on the prolonged SARS-CoV-2 shedding. This study involved a large cohort of 56,878 hospitalized patients, and we leveraged the XGBoost algorithm to establish a predictive model based on these features. Upon analysis, six of these ten features were significantly associated with the prolonged SARS-CoV-2 shedding, as determined by both the importance order of the model and our results obtained through model reconstruction. Specifically, vaccination, hypertension, admission Ct value, gender, age, and family accompaniment were identified as the key features associated with prolonged viral shedding. Conclusions We developed a predictive model and identified six risk factors associated with prolonged SARS-CoV-2 viral shedding. Our study contributes to identifying and screening individuals with potential long-term SARS-CoV-2 infections. Moreover, our research also provides a reference for future preventive control, optimizing medical resource allocation and guiding epidemiological prevention, and guidelines for personal protection against SARS-CoV-2.
Keywords