Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species.

Barbara Kachigunda; Kerrie Mengersen; Devindri I Perera; Grey T Coupland; Johann van der Merwe; Simon McKirdy

doi:10.1371/journal.pone.0272413

PLoS ONE (Jan 2022)

Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species.

Barbara Kachigunda,
Kerrie Mengersen,
Devindri I Perera,
Grey T Coupland,
Johann van der Merwe,
Simon McKirdy

Affiliations

Barbara Kachigunda
Kerrie Mengersen
Devindri I Perera
Grey T Coupland
Johann van der Merwe
Simon McKirdy

DOI: https://doi.org/10.1371/journal.pone.0272413
Journal volume & issue: Vol. 17, no. 8
p. e0272413

Abstract

Read online

Appropriate inspection protocols and mitigation strategies are a critical component of effective biosecurity measures, enabling implementation of sound management decisions. Statistical models to analyze biosecurity surveillance data are integral to this decision-making process. Our research focuses on analyzing border interception biosecurity data collected from a Class A Nature Reserve, Barrow Island, in Western Australia and the associated covariates describing both spatial and temporal interception patterns. A clustering analysis approach was adopted using a generalization of the popular k-means algorithm appropriate for mixed-type data. The analysis approach compared the efficiency of clustering using only the numerical data, then subsequently including covariates to the clustering. Based on numerical data only, three clusters gave an acceptable fit and provided information about the underlying data characteristics. Incorporation of covariates into the model suggested four distinct clusters dominated by physical location and type of detection. Clustering increases interpretability of complex models and is useful in data mining to highlight patterns to describe underlying processes in biosecurity and other research areas. Availability of more relevant data would greatly improve the model. Based on outcomes from our research we recommend broader use of cluster models in biosecurity data, with testing of these models on more datasets to validate the model choice and identify important explanatory variables.

Published in PLoS ONE

ISSN: 1932-6203 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Medicine; Science
Website: https://journals.plos.org/plosone/

About the journal