Journal of Marine Science and Engineering (May 2024)

Machine Learning-Based Anomaly Detection on Seawater Temperature Data with Oversampling

  • Hangoo Kang,
  • Dongil Kim,
  • Sungsu Lim

DOI
https://doi.org/10.3390/jmse12050807
Journal volume & issue
Vol. 12, no. 5
p. 807

Abstract

Read online

This study deals with a method for anomaly detection in seawater temperature data using machine learning methods with oversampling techniques. Data were acquired from 2017 to 2023 using a Conductivity–Temperature–Depth (CTD) system in the Pacific Ocean, Indian Ocean, and Sea of Korea. The seawater temperature data consist of 1414 profiles including 1218 normal and 196 abnormal profiles. This dataset has an imbalance problem in which the amount of abnormal data is insufficient compared to that of normal data. Therefore, we generated abnormal data with oversampling techniques using duplication, uniform random variable, Synthetic Minority Oversampling Technique (SMOTE), and autoencoder (AE) techniques for the balance of data class, and trained Interquartile Range (IQR)-based, one-class support vector machine (OCSVM), and Multi-Layer Perceptron (MLP) models with a balanced dataset for anomaly detection. In the experimental results, the F1 score of the MLP showed the best performance at 0.882 in the combination of learning data, consisting of 30% of the minor data generated by SMOTE. This result is a 71.4%-point improvement over the F1 score of the IQR-based model, which is the baseline of this study, and is 1.3%-point better than the best-performing model among the models without oversampling data.

Keywords