Improved Parallel Random Forest Algorithm Combining Information Theory and Norm

MAO Yimin, GENG Junhao

doi:10.3778/j.issn.1673-9418.2010064

Jisuanji kexue yu tansuo (May 2022)

Improved Parallel Random Forest Algorithm Combining Information Theory and Norm

MAO Yimin, GENG Junhao

Affiliations

MAO Yimin, GENG Junhao: School of Information Engineering, Jiangxi University of Science & Technology, Ganzhou, Jiangxi 341000, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.2010064
Journal volume & issue: Vol. 16, no. 5
pp. 1064 – 1075

Abstract

Read online

Aiming at the problems of excessive redundancy and irrelevant features, low training feature information and low parallelization efficiency in big data random forest algorithm based on MapReduce, this paper proposes a parallel random forest algorithm based on information theory and norm (PRFITN). Firstly, the algorithm designs the DRIGFN (dimension reduction based on information gain and Frobenius norm) strategy to reduce the number of redundant and irrelevant features. Secondly, a feature grouping strategy based on information theory (FGSIT) is proposed. According to the FGSIT strategy, the features are grouped, and the stratified sampling method is adopted to ensure the information amount of the training features when constructing the decision tree in the random forest. Accuracy of classification results is improved. Finally, in order to improve the parallel efficiency of the cluster, the redistribution of key-value pairs (RSKP) is presented to realize the rapid and uniform distribution of key-value pairs, and obtain the global classification results. Experimental results show that the algorithm has better classification effect in big data environment, especially for datasets with more features.

|mapreduce|random forest (rf)|drigfn strategy|feature grouping strategy based on information theory (fgsit)|redistribution of key-value pairs (rskp) strategy

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords