SARS-CoV-2 Detection From Voice

Gadi Pinkas; Yarden Karny; Aviad Malachi; Galia Barkai; Gideon Bachar; Vered Aharonson

doi:10.1109/OJEMB.2020.3026468

IEEE Open Journal of Engineering in Medicine and Biology (Jan 2020)

SARS-CoV-2 Detection From Voice

Gadi Pinkas,
Yarden Karny,
Aviad Malachi,
Galia Barkai,
Gideon Bachar,
Vered Aharonson

Affiliations

Gadi Pinkas: Afeka Center of Language Processing, Afeka, Tel Aviv Academic College of Engineering, Tel Aviv-Yafo, Israel
Yarden Karny: Afeka Center of Language Processing, Afeka, Tel Aviv Academic College of Engineering, Tel Aviv-Yafo, Israel
Aviad Malachi: Afeka Center of Language Processing, Afeka, Tel Aviv Academic College of Engineering, Tel Aviv-Yafo, Israel
Galia Barkai: ORCiD; Pediatric Infectious Diseases Unit, Safra Children's Hospital, Sheba Medical Center and Sackler School of Medicine, Tel-Aviv University, Tel Aviv-Yafo, Israel
Gideon Bachar: Department of Otorhinolaryngology, Rabin Medical center and Sackler School of Medicine, Tel-Aviv University, Tel Aviv-Yafo, Israel
Vered Aharonson: ORCiD; Afeka Center of Language Processing, Afeka, Tel Aviv Academic College of Engineering, Tel Aviv-Yafo, Israel

DOI: https://doi.org/10.1109/OJEMB.2020.3026468
Journal volume & issue: Vol. 1
pp. 268 – 274

Abstract

Read online

Automated voice-based detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) could facilitate the screening for COVID19. A dataset of cellular phone recordings from 88 subjects was recently collected. The dataset included vocal utterances, speech and coughs that were self-recorded by the subjects in either hospitals or isolation sites. All subjects underwent nasopharyngeal swabbing at the time of recording and were labelled as SARS-CoV-2 positives or negative controls. The present study harnessed deep machine learning and speech processing to detect the SARS-CoV-2 positives. A three-stage architecture was implemented. A self-supervised attention-based transformer generated embeddings from the audio inputs. Recurrent neural networks were used to produce specialized sub-models for the SARS-CoV-2 classification. An ensemble stacking fused the predictions of the sub-models. Pre-training, bootstrapping and regularization techniques were used to prevent overfitting. A recall of 78% and a probability of false alarm (PFA) of 41% were measured on a test set of 57 recording sessions. A leave-one-speaker-out cross validation on 292 recording sessions yielded a recall of 78% and a PFA of 30%. These preliminary results imply a feasibility for COVID19 screening using voice.

Published in IEEE Open Journal of Engineering in Medicine and Biology

ISSN: 2644-1276 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Medicine: Medicine (General): Medical technology
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8782705

About the journal

Abstract

Keywords