IEEE Access (Jan 2019)

HLMCC: A Hybrid Learning Anomaly Detection Model for Unlabeled Data in Internet of Things

  • Nusaybah Alghanmi,
  • Reem Alotaibi,
  • Seyed M Buhari

DOI
https://doi.org/10.1109/ACCESS.2019.2959739
Journal volume & issue
Vol. 7
pp. 179492 – 179504

Abstract

Read online

The Internet of Things (IoT) is a network of distributed devices or sensors connected through the internet to allow gathering and sharing of data. The data generated by these devices is affected by anomalies or abnormal behaviour due to attack issues, or breakdown in devices, as examples. However, most current anomaly detection systems rely on labelled data, while the class labels for IoT data are usually unavailable. Furthermore, the manual labelling task is expensive and time-consuming to perform due to the need for domain experts. More importantly, the volume of data in the IoT is growing rapidly, creating a need to predict the classification labels for future data. This study proposes a Hybrid Learning Model which uses both Clustering and Classification methods (HLMCC) to automate the labelling process and detect anomalies in IoT data. The model consists of two practical phases, automatic labelling and detecting anomalies. First, the HLMCC groups the data into normal and anomaly clusters by adopting Hierarchical Affinity Propagation (HAP) clustering. Second, the labelled data obtained from the clustering phase is used to train the Decision Trees (DTs) and to classify future unseen data. The results show that the HLMCC is able to automate the labelling of data, which is beneficial to minimize human involvement. Moreover, HLMCC outperforms the DTs on the originally labelled datasets and the state-of-the-art model over a wide range of evaluation metrics based on the average ranks. HLMCC produces the highest average ranks against other models in terms of False Positive Rate (FPR), recall, precision and the Area Under the Precision-Recall curve (AUCPR) with 1.8, 1.6, 1.8 and 1.8, respectively.

Keywords