Measurement: Sensors (Dec 2022)
Analysis of Hadoop log file in an environment for dynamic detection of threats using machine learning
Abstract
Log files are a significant piece of records that once properly analyzed, can yield helpful information. The usage of cloud services for log analysis is almost inevitable due to the greater rate of production and quantity of hardware and sensors that produce logs. The idea is to adopt machine learning mechanisms to “understand” what such a test suite's “anticipated” behavior is. This implies the system needs to be able to learn the difference among a Hadoop log file that contains a threat that does not really, based on past patterns. Many programs that really can effectively monitor the Hadoop cluster have been developed. The bulk of these tools collect relevant information from every cluster node and send it to be processed. The majority of these diagnostic tools were post-execution techniques commonly. This paper gives an experimental evaluation of the various log analysts used in Hadoop for error monitoring and detection. MapReduce is configured using these files. The algorithms were tested on a parser set of labelled log files, and their efficiency was assessed by looking for abnormal events in the Hadoop log files and executing an experimentation using the methods. The algorithms used were Term Frequency Inverse Document Frequency, Random Forest, and Local Outlier Factor. Later, the clustering with K-Means and Principal component analysis is utilized to extract certain relevant details from the data, monitor groups of bits of data to find aberrant events. The Term Frequency Inverse Document Frequency strategy surpassed the other two techniques in predicting threat occurrences in data, according to the findings. The findings would help developers in locating abnormal occurrences without having to go through the Hadoop log file row by row individually. The framework generates events that behave in a different way than the rest of such events in the log and leads the mechanism to fail to work.