Impact of Dataset and Model Parameters on Machine Learning Performance for the Detection of GPS Spoofing Attacks on Unmanned Aerial Vehicles

Tala Talaei Khoei; Shereen Ismail; Khair Al Shamaileh; Vijay Kumar Devabhaktuni; Naima Kaabouch

doi:10.3390/app13010383

Applied Sciences (Dec 2022)

Impact of Dataset and Model Parameters on Machine Learning Performance for the Detection of GPS Spoofing Attacks on Unmanned Aerial Vehicles

Tala Talaei Khoei,
Shereen Ismail,
Khair Al Shamaileh,
Vijay Kumar Devabhaktuni,
Naima Kaabouch

Affiliations

Tala Talaei Khoei: School of Electrical Engineering and Computer Science, University of North Dakota, Grand Forks, ND 58202, USA
Shereen Ismail: School of Electrical Engineering and Computer Science, University of North Dakota, Grand Forks, ND 58202, USA
Khair Al Shamaileh: Electrical and Computer Engineering Department, Purdue University Northwest, Hammond, IN 46323, USA
Vijay Kumar Devabhaktuni: Electrical and Computer Engineering Department, The University of Maine, Orono, ME 04469, USA
Naima Kaabouch: School of Electrical Engineering and Computer Science, University of North Dakota, Grand Forks, ND 58202, USA

DOI: https://doi.org/10.3390/app13010383
Journal volume & issue: Vol. 13, no. 1
p. 383

Abstract

Read online

GPS spoofing attacks are a severe threat to unmanned aerial vehicles. These attacks manipulate the true state of the unmanned aerial vehicles, potentially misleading the system without raising alarms. Several techniques, including machine learning, have been proposed to detect these attacks. Most of the studies applied machine learning models without identifying the best hyperparameters, using feature selection and importance techniques, and ensuring that the used dataset is unbiased and balanced. However, no current studies have discussed the impact of model parameters and dataset characteristics on the performance of machine learning models; therefore, this paper fills this gap by evaluating the impact of hyperparameters, regularization parameters, dataset size, correlated features, and imbalanced datasets on the performance of six most commonly known machine learning techniques. These models are Classification and Regression Decision Tree, Artificial Neural Network, Random Forest, Logistic Regression, Gaussian Naïve Bayes, and Support Vector Machine. Thirteen features extracted from legitimate and simulated GPS attack signals are used to perform this investigation. The evaluation was performed in terms of four metrics: accuracy, probability of misdetection, probability of false alarm, and probability of detection. The results indicate that hyperparameters, regularization parameters, correlated features, dataset size, and imbalanced datasets adversely affect a machine learning model’s performance. The results also show that the Classification and Regression Decision Tree classifier has an accuracy of 99.99%, a probability of detection of 99.98%, a probability of misdetection of 0.2%, and a probability of false alarm of 1.005%, after removing correlated features and using tuned parameters in a balanced dataset. Random Forest can achieve an accuracy of 99.94%, a probability of detection of 99.6%, a probability of misdetection of 0.4%, and a probability of false alarm of 1.01% in similar conditions.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords