Journal of Big Data (Jun 2024)

Correlation-based outlier detection for ships’ in-service datasets

  • Prateek Gupta,
  • Adil Rasheed,
  • Sverre Steen

DOI
https://doi.org/10.1186/s40537-024-00937-2
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 21

Abstract

Read online

Abstract With the advent of big data, it has become increasingly difficult to obtain high-quality data. Solutions are required to remove undesired outlier samples from massively large datasets. Ship operators rely on high-frequency in-service datasets recorded onboard the ships for monitoring the performance of their fleet. The large in-service datasets are known to be highly unbalanced, making it difficult to adopt ordinary outlier detection techniques, as they would also result in the removal of rare but quite valuable data samples. Thus, the current work proposes to establish a correlation-based outlier detection scheme for ships’ in-service datasets using two well-known dimensionality reduction methods, namely, Principal Component Analysis (PCA) and Autoencoders. The correlation-based approach detects samples which do not fit the prominent correlations present in the dataset and avoids misidentifying the rare but correlation-following samples in the sparse regions of data domain. The study also attempts to provide the physical meaning of the latent variables obtained using PCA. The effectiveness of the proposed methodology is proven using an actual dataset recorded onboard a ship.

Keywords