IEEE Access (Jan 2023)
Intensive Statistical Exploration to Identify Osteoporosis Predisposing Factors and Optimizing Recognition Performance With Integrated GP Kernels
Abstract
Osteoporosis, a common skeletal disorder, necessitates the identification of its risk factors to develop effective preventive measures. It is crucial to identify the underlying risk factors and their relationships with the response class attribute. Different machine learning (ML) algorithms and feature selection approaches are used to estimate the risk of osteoporosis. However, ML-based algorithms may struggle to detect risk factors as well as grading of osteoporosis due to different measurement scale of data and their probability distributional assumptions. Violation of these assumptions and results interpretation may be improper in the presence of heteroscedasticity, or unequal variance in data. In this study, we seek to overcome distribution assumption constraints and improve the interpretability of our results by using rigorous statistical approaches, ensuring a robust and trustworthy study of osteoporosis risk variables. The study dataset consists of 40 clinical, lifestyle, and genetic attributes, allowing for a comprehensive analysis of potential risk factors associated with osteoporosis. In the analysis, after confirming the normality assumption using Kolmogorov-Smirnov and Shapiro-Wilk tests, independent t-test assess the factor ALT, FBG, HDL-C, LDL-C, FNT, TL, TLT, and URIC has a substantial impact on the risk of developing osteoporosis. The Mann-Whitney U test for the non-normal FN variable likewise showed a p-value of less than 0.05, indicating that this variable has a significant effect on the likelihood of developing osteoporosis. Based on the chi-square test p-values for the categorical factors, gender, calcium, calcitriol, bisphosphonate, calcitonin, COPD, CAD, and drinking have a severe significant risk of osteoporosis. For developing the predictive Gaussian Process (GPs) model, we proposed two customized integrated GP kernels into the analysis to enhance the modeling of complex relationships within the data. The proposed GP kernel model (modified kernel 2) outperforms the other individual kernels in this experiment and has the best accuracy score of 86.64% and AUC score of 86.63% on osteoporosis data. Moreover, a simulation study is also conducted to robustify the proposed model, the results are improved by different evaluation matrices ranging in accuracy from 0.60-11.41% and AUC from 0.50-11.60%.
Keywords