Atmospheric Measurement Techniques (Apr 2019)

Application of high-dimensional fuzzy <i>k</i>-means cluster analysis to CALIOP/CALIPSO version 4.1 cloud–aerosol discrimination

  • S. Zeng,
  • S. Zeng,
  • M. Vaughan,
  • Z. Liu,
  • C. Trepte,
  • J. Kar,
  • J. Kar,
  • A. Omar,
  • D. Winker,
  • P. Lucker,
  • P. Lucker,
  • Y. Hu,
  • B. Getzewich,
  • B. Getzewich,
  • M. Avery

DOI
https://doi.org/10.5194/amt-12-2261-2019
Journal volume & issue
Vol. 12
pp. 2261 – 2285

Abstract

Read online

This study applies fuzzy k-means (FKM) cluster analyses to a subset of the parameters reported in the CALIPSO lidar level 2 data products in order to classify the layers detected as either clouds or aerosols. The results obtained are used to assess the reliability of the cloud–aerosol discrimination (CAD) scores reported in the version 4.1 release of the CALIPSO data products. FKM is an unsupervised learning algorithm, whereas the CALIPSO operational CAD algorithm (COCA) takes a highly supervised approach. Despite these substantial computational and architectural differences, our statistical analyses show that the FKM classifications agree with the COCA classifications for more than 94 % of the cases in the troposphere. This high degree of similarity is achieved because the lidar-measured signatures of the majority of the clouds and the aerosols are naturally distinct, and hence objective methods can independently and effectively separate the two classes in most cases. Classification differences most often occur in complex scenes (e.g., evaporating water cloud filaments embedded in dense aerosol) or when observing diffuse features that occur only intermittently (e.g., volcanic ash in the tropical tropopause layer). The two methods examined in this study establish overall classification correctness boundaries due to their differing algorithm uncertainties. In addition to comparing the outputs from the two algorithms, analysis of sampling, data training, performance measurements, fuzzy linear discriminants, defuzzification, error propagation, and key parameters in feature type discrimination with the FKM method are further discussed in order to better understand the utility and limits of the application of clustering algorithms to space lidar measurements. In general, we find that both FKM and COCA classification uncertainties are only minimally affected by noise in the CALIPSO measurements, though both algorithms can be challenged by especially complex scenes containing mixtures of discrete layer types. Our analysis results show that attenuated backscatter and color ratio are the driving factors that separate water clouds from aerosols; backscatter intensity, depolarization, and mid-layer altitude are most useful in discriminating between aerosols and ice clouds; and the joint distribution of backscatter intensity and depolarization ratio is critically important for distinguishing ice clouds from water clouds.