Atmospheric Measurement Techniques (Aug 2021)
Data imputation in in situ-measured particle size distributions by means of neural networks
Abstract
In air quality research, often only size-integrated particle mass concentrations as indicators of aerosol particles are considered. However, the mass concentrations do not provide sufficient information to convey the full story of fractionated size distribution, in which the particles of different diameters (Dp) are able to deposit differently on respiratory system and cause various harm. Aerosol size distribution measurements rely on a variety of techniques to classify the aerosol size and measure the size distribution. From the raw data the ambient size distribution is determined utilising a suite of inversion algorithms. However, the inversion problem is quite often ill-posed and challenging to solve. Due to the instrumental insufficiency and inversion limitations, imputation methods for fractionated particle size distribution are of great significance to fill the missing gaps or negative values. The study at hand involves a merged particle size distribution, from a scanning mobility particle sizer (NanoSMPS) and an optical particle sizer (OPS) covering the aerosol size distributions from 0.01 to 0.42 µm (electrical mobility equivalent size) and 0.3 to 10 µm (optical equivalent size) and meteorological parameters collected at an urban background region in Amman, Jordan, in the period of 1 August 2016–31 July 2017. We develop and evaluate feed-forward neural network (FFNN) approaches to estimate number concentrations at particular size bin with (1) meteorological parameters, (2) number concentration at other size bins and (3) both of the above as input variables. Two layers with 10–15 neurons are found to be the optimal option. Worse performance is observed at the lower edge (0.01<Dp<0.02 µm), the mid-range region (0.15<Dp<0.5 µm) and the upper edge (6<Dp<10 µm). For the edges at both ends, the number of neighbouring size bins is limited, and the detection efficiency by the corresponding instruments is lower compared to the other size bins. A distinct performance drop over the overlapping mid-range region is due to the deficiency of a merging algorithm. Another plausible reason for the poorer performance for finer particles is that they are more effectively removed from the atmosphere compared to the coarser particles so that the relationships between the input variables and the small particles are more dynamic. An observable overestimation is also found in the early morning for ultrafine particles followed by a distinct underestimation before midday. In the winter, due to a possible sensor drift and interference artefacts, the estimation performance is not as good as the other seasons. The FFNN approach by meteorological parameters using 5 min data (R2= 0.22–0.58) shows poorer results than data with longer time resolution (R2= 0.66–0.77). The FFNN approach using the number concentration at the other size bins can serve as an alternative way to replace negative numbers in the size distribution raw dataset thanks to its high accuracy and reliability (R2= 0.97–1). This negative-number filling approach can maintain a symmetric distribution of errors and complement the existing ill-posed built-in algorithm in particle sizer instruments.