Environment International (Mar 2019)
Metabarcoding and machine learning analysis of environmental DNA in ballast water arriving to hub ports
Abstract
While ballast water has long been linked to the global transport of invasive species, little is known about its microbiome. Herein, we used 16S rRNA gene sequencing and metabarcoding to perform the most comprehensive microbiological survey of ballast water arriving to hub ports to date. In total, we characterized 41 ballast, 20 harbor, and 6 open ocean water samples from four world ports (Shanghai, China; Singapore; Durban, South Africa; Los Angeles, California). In addition, we cultured Enterococcus and E. coli to evaluate adherence to International Maritime Organization standards for ballast discharge. Five of the 41 vessels – all of which were loaded in China – did not comply with standards for at least one indicator organism. Dominant bacterial taxa of ballast water at the class level were Alphaproteobacteria, Gammaproteobacteria, and Bacteroidia. Ballast water samples were composed of significantly lower proportions of Oxyphotobacteria than either ocean or harbor samples. Linear discriminant analysis (LDA) effect size (LEfSe) and machine learning were used to identify and test potential biomarkers for classifying sample types (ocean, harbor, ballast). Eight candidate biomarkers were used to achieve 81% (k nearest neighbors) to 88% (random forest) classification accuracy. Further research of these biomarkers could aid the development of techniques to rapidly assess ballast water origin. Keywords: Ballast water, Microbiome, Environmental DNA, Biomarker, Machine learning, High throughput sequencing