Nuclear Physics B (May 2023)
Automated feature selection procedure for particle jet classification
Abstract
The high dimensionality of the data produced in high-energy physics experiments makes the use of machine learning algorithms, such as neural networks, necessary to improve the performance of reconstruction and classification of the analyzed events. Interpretability, i.e. the capability to explain the dynamics that lead the network to a certain outcome, emerged as a major need with architectures growing in complexity. In the analysis of pp collisions at the LHC, explainability firstly concern the assessment of the relative importance of high-level observables used to classify events. In this context, we have developed a method to select the most important features associated with a particle jet of which we want to establish the origin. Features are importance-sorted with a decision tree algorithm. A k-fold cross-validation is applied to raise the confidence in the extracted ranking. We tested the method with the case of highly boosted di-jet resonances decaying to two b-quarks, to be selected against an overwhelming QCD background with a Deep Neural network. We show that noisy and irrelevant features are rejected while relevant features occupy the top-ranking positions.