Scientific Data (Nov 2023)

The smarty4covid dataset and knowledge base as a framework for interpretable physiological audio data analysis

  • Konstantia Zarkogianni,
  • Edmund Dervakos,
  • George Filandrianos,
  • Theofanis Ganitidis,
  • Vasiliki Gkatzou,
  • Aikaterini Sakagianni,
  • Raghu Raghavendra,
  • C. L. Max Nikias,
  • Giorgos Stamou,
  • Konstantina S. Nikita

DOI
https://doi.org/10.1038/s41597-023-02646-6
Journal volume & issue
Vol. 10, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Harnessing the power of Artificial Intelligence (AI) and m-health towards detecting new bio-markers indicative of the onset and progress of respiratory abnormalities/conditions has greatly attracted the scientific and research interest especially during COVID-19 pandemic. The smarty4covid dataset contains audio signals of cough (4,676), regular breathing (4,665), deep breathing (4,695) and voice (4,291) as recorded by means of mobile devices following a crowd-sourcing approach. Other self reported information is also included (e.g. COVID-19 virus tests), thus providing a comprehensive dataset for the development of COVID-19 risk detection models. The smarty4covid dataset is released in the form of a web-ontology language (OWL) knowledge base enabling data consolidation from other relevant datasets, complex queries and reasoning. It has been utilized towards the development of models able to: (i) extract clinically informative respiratory indicators from regular breathing records, and (ii) identify cough, breath and voice segments in crowd-sourced audio recordings. A new framework utilizing the smarty4covid OWL knowledge base towards generating counterfactual explanations in opaque AI-based COVID-19 risk detection models is proposed and validated.