Journal of Big Data (Oct 2021)

Highly accurate and efficient two phase-intrusion detection system (TP-IDS) using distributed processing of HADOOP and machine learning techniques

  • Abhijit Dnyaneshwar Jadhav,
  • Vidyullatha Pellakuri

DOI
https://doi.org/10.1186/s40537-021-00521-y
Journal volume & issue
Vol. 8, no. 1
pp. 1 – 22

Abstract

Read online

Abstract Network security and data security are the biggest concerns now a days. Every organization decides their future business process based on the past and day to day transactional data. This data may consist of consumer’s confidential data, which needs to be kept secure. Also, the network connections when established with the external communication devices or entities, a care should be taken to authenticate these and block the unwanted access. This consists of identification of the malicious connection nodes and identification of normal connection nodes. For that, we use a continuous monitoring of the network input traffic to recognize the malicious connection request called as intrusion and this type of monitoring system is called as an Intrusion detection system (IDS). IDS helps us to protect our network and data from insecure and malicious network connections. Many such systems exists in the real time scenario, but they have critical issues of performance like accuracy and efficiency. These issues are addressed as a part of this research work of IDS using machine learning techniques and HDFS. The TP-IDS is designed in two phases for increasing accuracy. In phase I of TP-IDS, Support Vector Machine (SVM) and k Nearest Neighbor (kNN) are used. In phase II of TP-IDS, Decision Tree (DT) and Naïve Bayes (NB) are used, where phase II is the validation phase of the system for increasing accuracy. Also, both the phases are having Hadoop distributed file system underlying data storage and processing architecture, which allows parallel processing to increase the speed of the system and hence achieve the efficiency in TP-IDS.

Keywords