IEEE Access (Jan 2024)
An Efficient Hybrid Feature Selection Technique Toward Prediction of Suspicious URLs in IoT Environment
Abstract
With the growth of IoT, a vast number of devices are connected to the web. Consequently, both users and devices are susceptible to deception by intruders through malicious links leading to the disclosure of personal information. Hence, it is essential to identify suspicious URLs before accessing them. While numerous researchers have proposed several URL detection approaches, the machine learning-based technique stands out as particularly effective because of its ability to detect zero-day attacks; however, its success depends on the type and dimension of features utilized. In earlier research, only the lexical features of URLs were employed for classification to attain high detection speeds. However, this approach does not allow for the retrieval of comprehensive information about a website. Hence, to enhance the security of IoT devices, both lexical and page content-based features of URLs must be considered. To improve the performance of the model, the researchers extract informative features using different Feature Selection Techniques (FSTs), including filter and wrapper methods. However, challenges such as the demand for more resources, time, and handling of high-dimensional datasets encountered by individual FSTs have driven the development of hybrid FSTs. Nevertheless, the combination of a filter-based FST and a wrapper search-based Genetic Algorithm (GA) is used in the identification of malicious URLs as well as the detection of malicious links in the IoT devices research studies. Therefore, the proposed approach leverages the advantages of a variety of features and explores a hybrid FST to produce the optimal feature subset to evaluate the boosting estimators with specific hyperparameter configurations. Our proposed approach effectively fills the research gap associated with previous methodologies research 99% while keeping the computational costs minimal, making it suitable for resource-constrained devices in detecting malignant URLs.
Keywords