Information Processing in Agriculture (Aug 2014)
Time-series prediction of shellfish farm closure: A comparison of alternatives
Abstract
Shellfish farms are closed for harvest when microbial pollutants are present. Such pollutants are typically present in rainfall runoff from various land uses in catchments. Experts currently use a number of observable parameters (river flow, rainfall, salinity) as proxies to determine when to close farms. We have proposed using the short term historical rainfall data as a time-series prediction problem where we aim to predict the closure of shellfish farms based only on rainfall. Time-series event prediction consists of two steps: (i) feature extraction, and (ii) prediction. A number of data mining challenges exist for these scenarios: (i) which feature extraction method best captures the rainfall pattern over successive days that leads to opening or closure of the farms?, (ii) The farm closure events occur infrequently and this leads to a class imbalance problem; the question is what is the best way to deal with this problem? In this paper we have analysed and compared different combinations of balancing methods (under-sampling and over-sampling), feature extraction methods (cluster profile, curve fitting, Fourier Transform, Piecewise Aggregate Approximation, and Wavelet Transform) and learning algorithms (neural network, support vector machine, k-nearest neighbour, decision tree, and Bayesian Network) to predict closure events accurately considering the above data mining challenges. We have identified the best combination of techniques to accurately predict shellfish farm closure from rainfall, given the above data mining challenges.
Keywords