Scientific Reports (Jun 2022)

Compilation of parasitic immunogenic proteins from 30 years of published research using machine learning and natural language processing

  • Stephen J. Goodswen,
  • Paul J. Kennedy,
  • John T. Ellis

DOI
https://doi.org/10.1038/s41598-022-13790-1
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 23

Abstract

Read online

Abstract The World Health Organisation reported in 2020 that six of the top 10 sources of death in low-income countries are parasites. Parasites are microorganisms in a relationship with a larger organism, the host. They acquire all benefits at the host’s expense. A disease develops if the parasitic infection disrupts normal functioning of the host. This disruption can range from mild to severe, including death. Humans and livestock continue to be challenged by established and emerging infectious disease threats. Vaccination is the most efficient tool for preventing current and future threats. Immunogenic proteins sourced from the disease-causing parasite are worthwhile vaccine components (subunits) due to reliable safety and manufacturing capacity. Publications with ‘subunit vaccine’ in their title have accumulated to thousands over the last three decades. However, there are possibly thousands more reporting immunogenicity results without mentioning ‘subunit’ and/or ‘vaccine’. The exact number is unclear given the non-standardised keywords in publications. The study aim is to identify parasite proteins that induce a protective response in an animal model as reported in the scientific literature within the last 30 years using machine learning and natural language processing. Source code to fulfil this aim and the vaccine candidate list obtained is made available.