Reliability of crowdsourced data and patient-reported outcome measures in cough-based COVID-19 screening

Hao Xiong; Shlomo Berkovsky; Mohamed Ali Kâafar; Adam Jaffe; Enrico Coiera; Roneel V. Sharan

doi:10.1038/s41598-022-26492-5

Scientific Reports (Dec 2022)

Reliability of crowdsourced data and patient-reported outcome measures in cough-based COVID-19 screening

Hao Xiong,
Shlomo Berkovsky,
Mohamed Ali Kâafar,
Adam Jaffe,
Enrico Coiera,
Roneel V. Sharan

Affiliations

Hao Xiong: Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University
Shlomo Berkovsky: Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University
Mohamed Ali Kâafar: Department of Computing, Macquarie University
Adam Jaffe: School of Women’s and Children’s Health, Faculty of Medicine, University of New South Wales
Enrico Coiera: Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University
Roneel V. Sharan: Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University

DOI: https://doi.org/10.1038/s41598-022-26492-5
Journal volume & issue: Vol. 12, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Mass community testing is a critical means for monitoring the spread of the COVID-19 pandemic. Polymerase chain reaction (PCR) is the gold standard for detecting the causative coronavirus 2 (SARS-CoV-2) but the test is invasive, test centers may not be readily available, and the wait for laboratory results can take several days. Various machine learning based alternatives to PCR screening for SARS-CoV-2 have been proposed, including cough sound analysis. Cough classification models appear to be a robust means to predict infective status, but collecting reliable PCR confirmed data for their development is challenging and recent work using unverified crowdsourced data is seen as a viable alternative. In this study, we report experiments that assess cough classification models trained (i) using data from PCR-confirmed COVID subjects and (ii) using data of individuals self-reporting their infective status. We compare performance using PCR-confirmed data. Models trained on PCR-confirmed data perform better than those trained on patient-reported data. Models using PCR-confirmed data also exploit more stable predictive features and converge faster. Crowd-sourced cough data is less reliable than PCR-confirmed data for developing predictive models for COVID-19, and raises concerns about the utility of patient reported outcome data in developing other clinical predictive models when better gold-standard data are available.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal