Jisuanji kexue (Dec 2021)
Noise Tolerable Feature Selection Method for Software Defect Prediction
Abstract
Software defect prediction can identify defective modules in advance by mining the defect datasets,helping testers to achieve more targeted testing.However,the ubiquity of label noise in the datasets affects the performance of the prediction mo-del.Few feature selection methods have been used to specifically design noise tolerance.In addition,the strategy selection in the mainstream noise tolerable feature selection framework can only be performed manually based on human experience,which is difficult to be applied in software engineering.In view of this,this paper proposes a novel method NTFES (noise tolerable feature selection).In particular,NTFES first generates multiple Bootstrap samples by Bootstrap sampling method.Then it divides the original features into different groups on Bootstrap samples by approximate Markov blanket and selects candidate features from each group based on two heuristic feature selection strategies. Sequently it uses genetic algorithm (GA) to search the optimal feature subset in the candidate feature space.To verify the effectiveness of the proposed method,this paper chooses NASA MDP dataset,and inject label noises simultaneously to imitate noisy datasets.Then it compares NTFES with other classical baseline methods,such as FULL,FCBF and CFS,by controlling the ratio of label noises.The experimental results show that the proposed method has the advantages of achieving higher classification performance and has better noise tolerable while the ratio of label noises is acceptable.
Keywords