A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

Yong Wang; Wenlong Ke; Xiaoling Tao

doi:10.3390/info7010006

Information (Feb 2016)

A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

Yong Wang,
Wenlong Ke,
Xiaoling Tao

Affiliations

Yong Wang: Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China
Wenlong Ke: Key Laboratory of Cognitive Radio and Information Processing, Guilin University of Electronic Technology, Guilin 541004, China
Xiaoling Tao: Guangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems, Guilin University of Electronic Technology, Guilin 541004, China

DOI: https://doi.org/10.3390/info7010006
Journal volume & issue: Vol. 7, no. 1
p. 6

Abstract

Read online

Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords