An Accuracy-Maximization Approach for Claims Classifiers in Document Content Analytics for Cybersecurity

Kimia Ameri; Michael Hempel; Hamid Sharif; Juan Lopez Jr.; Kalyan Perumalla

doi:10.3390/jcp2020022

Journal of Cybersecurity and Privacy (Jun 2022)

An Accuracy-Maximization Approach for Claims Classifiers in Document Content Analytics for Cybersecurity

Kimia Ameri,
Michael Hempel,
Hamid Sharif,
Juan Lopez Jr.,
Kalyan Perumalla

Affiliations

Kimia Ameri: Department of Electrical & Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68182, USA
Michael Hempel: Department of Electrical & Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68182, USA
Hamid Sharif: Department of Electrical & Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68182, USA
Juan Lopez Jr.: Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
Kalyan Perumalla: Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA

DOI: https://doi.org/10.3390/jcp2020022
Journal volume & issue: Vol. 2, no. 2
pp. 418 – 443

Abstract

Read online

This paper presents our research approach and findings towards maximizing the accuracy of our classifier of feature claims for cybersecurity literature analytics, and introduces the resulting model ClaimsBERT. Its architecture, after extensive evaluations of different approaches, introduces a feature map concatenated with a Bidirectional Encoder Representation from Transformers (BERT) model. We discuss deployment of this new concept and the research insights that resulted in the selection of Convolution Neural Networks for its feature mapping aspects. We also present our results showing ClaimsBERT to outperform all other evaluated approaches. This new claims classifier represents an essential processing stage within our vetting framework aiming to improve the cybersecurity of industrial control systems (ICS). Furthermore, in order to maximize the accuracy of our new ClaimsBERT classifier, we propose an approach for optimal architecture selection and determination of optimized hyperparameters, in particular the best learning rate, number of convolutions, filter sizes, activation function, the number of dense layers, as well as the number of neurons and the drop-out rate for each layer. Fine-tuning these hyperparameters within our model led to an increase in classification accuracy from 76% obtained with BertForSequenceClassification’s original model to a 97% accuracy obtained with ClaimsBERT.

Published in Journal of Cybersecurity and Privacy

ISSN: 2624-800X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General)
Website: https://www.mdpi.com/journal/jcp

About the journal

Abstract

Keywords