Heliyon (Dec 2024)

Predictive analytics technique based on hybrid sampling to manage unbalanced data in smart cities

  • Ayushi Chahal,
  • Preeti Gulia,
  • Nasib Singh Gill,
  • Mohammad Yahya,
  • Mohd Anul Haq,
  • Mohammed Aleisa,
  • Abdullah Alenizi,
  • Arfat Ahmad Khan,
  • Piyush Kumar Shukla

Journal volume & issue
Vol. 10, no. 24
p. e39275

Abstract

Read online

A smart city is deemed smart enough because it has the capability to make decisions on its own. Artificial intelligence needs a lot of data from the physical world to make correct decisions. IoT sensor devices collect data from the surroundings, which is further used for predictive analytics. Collected data may be balanced or imbalanced. Unbalanced data used for decision-making without any pre-processing may lead to ravaging results. This paper proposes a novel predictive analytical technique to manage unbalanced data. A pipeline is designed using Principal Component Analysis (PCA), a hybrid sampling method, and a Machine Learning (ML) prediction method. SMOTE + ENN, a hybrid data balancing method, is used to specify imbalanced data to a balanced state. ML method is applied to form clusters and make predictions over the dataset. A large Smart City IoT dataset having 4,05,184 records has been used in this study. The proposed technique is used to predict the presence of a person in the vicinity of IoT devices. Evaluation parameters such as accuracy, precision, recall, F1-score, and Area Under Curve (AUC)/Receiver Operating Characteristic (ROC) curve are used to evaluate the proposed approach. Accuracy, Precision, Recall, F1-score, and AUC obtained using the proposed technique for cluster 0 are 0.79, 1.0, 0.79, 0.87, and 0.88 and for cluster 1 are 0.86 0.99, 0.86, 0.92, and 0.92, respectively. In view of the encouraging results, the proposed technique may prove to be a good choice to help in decision-making in different application domains in real life.

Keywords