Journal of Mathematics (Jan 2024)
A Group Feature Screening Procedure Based on Pearson Chi-Square Statistic for Biology Data with Categorical Response
Abstract
The analysis of biogenetic data makes an important contribution to the understanding of disease mechanisms and the diagnosis of rare diseases. In this analysis, the selection of significant features affecting the disease provides an effective basis for subsequent disease judgment and treatment direction. However, this is not a simple task as biogenetic data have challenges such as ultra-high dimensionality of potential features, imbalance of response variables, and genetic associations. This study focuses on the group structure in feature screening with biogenetic data. Specifically, group structure exists for biogenetic data, so we need to analyze the entire genome rather than individual strongly correlated genes. This study proposes a group feature screening method that considers group correlations using adjusted Pearson’s cardinality statistic to address this issue. The method can be applied to both continuous and discrete covariates. The performance of the proposed method is illustrated by simulation studies, where the proposed method performs well with imbalanced data and multicategorical responses. In the application of lung cancer diagnosis, the proposed method for imbalanced data categorization is impressive, and the dimension reduction using linear discriminant is still good.