Confronting data sparsity to identify potential sources of Zika virus spillover infection among primates
Barbara A. Han,
Subhabrata Majumdar,
Flavio P. Calmon,
Benjamin S. Glicksberg,
Raya Horesh,
Abhishek Kumar,
Adam Perer,
Elisa B. von Marschall,
Dennis Wei,
Aleksandra Mojsilović,
Kush R. Varshney
Affiliations
Barbara A. Han
Cary Institute of Ecosystem Studies, Box AB Millbrook, NY 12545, USA; Corresponding author.
Subhabrata Majumdar
University of Florida Informatics Institute, 432 Newell Drive, CISE Bldg E251, Gainesville, FL 32611, USA
Flavio P. Calmon
Harvard University, 29 Oxford St, Cambridge, MA 02138, USA
Benjamin S. Glicksberg
Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, 94158, USA
Raya Horesh
IBM Research, 1101 Kitchawan Rd, Yorktown Heights, NY 10598, USA
Abhishek Kumar
Cary Institute of Ecosystem Studies, Box AB Millbrook, NY 12545, USA; University of Florida Informatics Institute, 432 Newell Drive, CISE Bldg E251, Gainesville, FL 32611, USA; Harvard University, 29 Oxford St, Cambridge, MA 02138, USA; Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, 94158, USA; IBM Research, 1101 Kitchawan Rd, Yorktown Heights, NY 10598, USA; Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA; IBM Watson Media & Weather, 550 Assembly St, Columbia, SC 29201, USA
Adam Perer
Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA
Elisa B. von Marschall
IBM Watson Media & Weather, 550 Assembly St, Columbia, SC 29201, USA
Dennis Wei
IBM Research, 1101 Kitchawan Rd, Yorktown Heights, NY 10598, USA
Aleksandra Mojsilović
IBM Research, 1101 Kitchawan Rd, Yorktown Heights, NY 10598, USA
Kush R. Varshney
IBM Research, 1101 Kitchawan Rd, Yorktown Heights, NY 10598, USA
The recent Zika virus (ZIKV) epidemic in the Americas ranks among the largest outbreaks in modern times. Like other mosquito-borne flaviviruses, ZIKV circulates in sylvatic cycles among primates that can serve as reservoirs of spillover infection to humans. Identifying sylvatic reservoirs is critical to mitigating spillover risk, but relevant surveillance and biological data remain limited for this and most other zoonoses. We confronted this data sparsity by combining a machine learning method, Bayesian multi-label learning, with a multiple imputation method on primate traits. The resulting models distinguished flavivirus-positive primates with 82% accuracy and suggest that species posing the greatest spillover risk are also among the best adapted to human habitations. Given pervasive data sparsity describing animal hosts, and the virtual guarantee of data sparsity in scenarios involving novel or emerging zoonoses, we show that computational methods can be useful in extracting actionable inference from available data to support improved epidemiological response and prevention. Keywords: Predictive analytics, Flavivirus, Arbovirus, Non-human primate, Machine learning, Bayesian multi-task learning, Imputation, Neotropical, Spillover, Spillback, Ecology, Surveillance