Empirical studies on the impact of filter‐based ranking feature selection on security vulnerability prediction

Xiang Chen; Zhidan Yuan; Zhanqi Cui; Dun Zhang; Xiaolin Ju

doi:10.1049/sfw2.12006

IET Software (Feb 2021)

Empirical studies on the impact of filter‐based ranking feature selection on security vulnerability prediction

Xiang Chen,
Zhidan Yuan,
Zhanqi Cui,
Dun Zhang,
Xiaolin Ju

Affiliations

Xiang Chen: School of Information Science and Technology Nantong University Nantong China
Zhidan Yuan: School of Information Science and Technology Nantong University Nantong China
Zhanqi Cui: Computer School Beijing Information Science and Technology University Beijing China
Dun Zhang: School of Information Science and Technology Nantong University Nantong China
Xiaolin Ju: School of Information Science and Technology Nantong University Nantong China

DOI: https://doi.org/10.1049/sfw2.12006
Journal volume & issue: Vol. 15, no. 1
pp. 75 – 89

Abstract

Read online

Abstract Security vulnerability prediction (SVP) can construct models to identify potentially vulnerable program modules via machine learning. Two kinds of features from different points of view are used to measure the extracted modules in previous studies. One kind considers traditional software metrics as features, and the other kind uses text mining to extract term vectors as features. Therefore, gathered SVP data sets often have numerous features and result in the curse of dimensionality. In this article, we mainly investigate the impact of filter‐based ranking feature selection (FRFS) methods on SVP, since other types of feature selection methods have too much computational cost. In empirical studies, we first consider three real‐world large‐scale web applications. Then we consider seven methods from three FRFS categories for FRFS and use a random forest classifier to construct SVP models. Final results show that given the similar code inspection cost, using FRFS can improve the performance of SVP when compared with state‐of‐the‐art baselines. Moreover, we use McNemar's test to perform diversity analysis on identified vulnerable modules by using different FRFS methods, and we are surprised to find that almost all the FRFS methods can identify similar vulnerable modules via diversity analysis.

Published in IET Software

ISSN: 1751-8806 (Print); 1751-8814 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/ietsfw

About the journal

Abstract

Keywords