Frontiers in Environmental Science (Oct 2023)
A cascade ensemble-learning model for the deployment at the edge: case on missing IoT data recovery in environmental monitoring systems
Abstract
In recent years, more and more applied industries have relied on data collection by IoT devices. Various IoT devices generate vast volumes of data that require efficient processing. Usually, the intellectual analysis of such data takes place in data centers in cloud environments. However, the problems of transferring large volumes of data and the long wait for a response from the data center for further corrective actions in the system led to the search for new processing methods. One possible option is Edge computing. Intelligent data analysis in the places of their collection eliminates the disadvantages mentioned above, revealing many advantages of using such an approach in practice. However, the Edge computing approach is challenging to implement when different IoT devices collect the independent attributes required for classification/regression. In order to overcome this limitation, the authors developed a new cascade ensemble-learning model for the deployment at the Edge. It is based on the principles of cascading machine learning methods, where each IoT device that collects data performs its analysis based on the attributes it contains. The results of its work are transmitted to the next IoT device, which analyzes the attributes it collects, taking into account the output of the previous device. All independent at-tributes are taken into account in this way. Because of this, the proposed approach provides: 1) The possibility of effective implementation of Edge computing for intelligent data analysis, that is, even before their transmission to the data center; 2) increasing, and in some cases maintaining, classification/regression accuracy at the same level that can be achieved in the data center; 3) significantly reducing the duration of training procedures due to the processing of a smaller number of attributes by each of the IoT devices. The simulation of the proposed approach was performed on a real-world set of IoT data. The missing data recovery task in the atmospheric air state data was solved. The authors selected the optimal parameters of the proposed approach. It was established that the developed model provides a slight increase in prediction accuracy while significantly reducing the duration of the training procedure. However, in this case, the main advantage is that all this happens within the bounds of Edge computing, which opens up several benefits of using the developed model in practice.
Keywords