Earth System Science Data (Jun 2024)
Characterizing clouds with the CCClim dataset, a machine learning cloud class climatology
Abstract
We present the new Cloud Class Climatology (CCClim) dataset, quantifying the global distribution of established morphological cloud types over 35 years. CCClim combines active and passive sensor data with machine learning (ML) and provides a new opportunity for improving the understanding of clouds and their related processes. CCClim is based on cloud property retrievals from the European Space Agency's (ESA) Cloud_cci dataset, adding relative occurrences of eight major cloud types, designed to be similar to those defined by the World Meteorological Organization (WMO) at 1° resolution. The ML framework used to obtain the cloud types is trained on data from multiple satellites in the afternoon constellation (A-Train). Using multiple spaceborne sensors reduces the impact of single-sensor problems like the difficulty of passive sensors to detect thin cirrus or the small footprint of active sensors. We leverage this to generate sufficient labeled data to train supervised ML models. CCClim's global coverage being almost gapless from 1982 to 2016 allows for performing process-oriented analyses of clouds on a climatological timescale. Similarly, the moderate spatial and temporal resolutions make it a lightweight dataset while enabling straightforward comparison to climate models. CCClim creates multiple opportunities to study clouds, of which we sketch out a few examples. Along with the cloud-type frequencies, CCClim contains the cloud properties used as inputs to the ML framework, such that all cloud types can be associated with relevant physical quantities. CCClim can also be combined with other datasets such as reanalysis data to assess the dynamical regime favoring the occurrence of a specific cloud type in association with its properties. Additionally, we show an example of how to evaluate a global climate model by comparing CCClim with cloud types obtained by applying the same ML method used to create CCClim to output from the icosahedral nonhydrostatic atmosphere model (ICON-A). CCClim can be accessed via the following digital object identifier: https://doi.org/10.5281/zenodo.8369202 (Kaps et al., 2023b).