IEEE Access (Jan 2024)
A Real-Life Evaluation of Supervised and Semi-Supervised Machine Learning Approaches for Indirect Estimation of Indoor Occupancy
Abstract
Occupancy information is essential for space management, energy efficiency, and in times of the COVID-19 pandemic, for crowd control. Obtaining labeled data is challenging due to hardware limitations, privacy considerations, and the required underlying costs. This study demonstrates the benefits of using Semi-Supervised Learning (SSL) for occupancy estimation in enclosed spaces; which requires less labeled data than other Machine Learning (ML) methods. It presents an empirical comparison between Supervised ML and SSL models in three real-life university classrooms (uncontrolled conditions). The data was collected for three weeks at each classroom using an in-house developed Internet of Things (IoT) device that measures air temperature, relative humidity, and atmospheric pressure. The ground truth records were gathered through manual logging of occupancy levels. Datasets’ sizes averaged 2350 entries with only 280 labeled instances per dataset. Support Vector Machine (SVM), Random Forest (RF), and Multi-Layer Perceptron (MLP) were used to define a performance baseline for supervised ML. Self-Training (ST) and Label Propagation (LP) were tested for SSL. ST achieved superior performance compared to baseline models (SVM, RF, MLP) with a highest average accuracy of 87.33% compared to SVM (86.66%). These results demonstrate the effectiveness of SSL for indirect occupancy estimation while reducing the need for extensive data collection and labeling.
Keywords