Frontiers in Marine Science (Nov 2022)

Research on outlier detection in CTD conductivity data based on cubic spline fitting

  • Long Yu,
  • Long Yu,
  • Long Yu,
  • Long Yu,
  • Jia Sun,
  • Jia Sun,
  • Jia Sun,
  • Yanliang Guo,
  • Yanliang Guo,
  • Yanliang Guo,
  • Baohua Zhang,
  • Guangbing Yang,
  • Guangbing Yang,
  • Guangbing Yang,
  • Liang Chen,
  • Liang Chen,
  • Liang Chen,
  • Xia Ju,
  • Xia Ju,
  • Xia Ju,
  • Fanlin Yang,
  • Xuejun Xiong,
  • Xuejun Xiong,
  • Xuejun Xiong,
  • Xianqing Lv

DOI
https://doi.org/10.3389/fmars.2022.1030980
Journal volume & issue
Vol. 9

Abstract

Read online

Outlier detection is the key to the quality control of marine survey data. For the detection of outliers in Conductivity-Temperature-Depth (CTD) data, previous methods, such as the Wild Edit method and the Median Filter Combined with Maximum Deviation method, mostly set a threshold based on statistics. Values greater than the threshold are treated as outliers, but there is no clear specification for the selection of threshold, thus multiple attempts are required. The process is time-consuming and inefficient, and the results have high false negative and positive rates. In response to this problem, we proposed an outlier detection method in CTD conductivity data, based on a physical constraint, the continuity of seawater. The method constructs a cubic spline fitting function based on the independent points scheme and the cubic spline interpolation to fit the conductivity data. The maximum fitting residual points will be flagged as outliers. The fitting stops when the optimal number of iterations is reached, which is automatically obtained by the minimum value of the sequence of maximum fitting residuals. Verification of the accuracy and stability of the method by means of examples proves that it has a lower false negative rate (17.88%) and false positive rate (0.24%) than other methods. Indeed, rates for the Wild Edit method are 56.96% and 2.19%, while for the Median Filter Combined with Maximum Deviation method rates are 23.28% and 0.31%. The Cubic Spline Fitting method is simple to operate, the result is clear and definite, better solved the problem of conductivity outliers detection.

Keywords