Applied Medical Informatics (Sep 2023)
Min-Max, Min-Max-Median, and Min-Max-IQR in Deciding Optimal Diagnostic Thresholds: Performances of a Logistic Regression Approach on Simulated and Real Data
Abstract
Combining biomarkers and their statistics is used to increase the prediction performance of a diagnosis, but no gold standard method exists. We introduced and evaluated an approach using linear combinations of summary-based statistics tested in logistic regression models with 10-fold repeated cross-validation. We used AUC (area under the ROC- receiver operating characteristic curve), the value of the Youden index, sensitivity (Se), specificity (Sp), diagnostic odds ratio (DOR), Efficiency Index (EI) and Inefficiency Index (InI) as performance metrics on the real-data set. We tested the approaches in multivariate normal distribution simulations with 4, 10, and 100 biomarkers and on real data. The results show that the summary-based models, especially minimum-maximum-median regression model (LR(MMM)) and minimum-maximum-interquartile range model (LR(MMIQR)), have similar performances or slightly better performances than the classical LR model regardless of the imposed mean of biomarkers or covariance matrixes on both simulated and real-data. The differences in AUCs were higher as the number of combined biomarkers increased (LR(MMIQR) model vs. LR model: 0.09 equal or unequal means of four biomarkers, 0.26 equal means, and 0.11 unequal means of 10 biomarkers). In real data, the linear combination of four biomarkers on LR(MMM) and LR(MMIQR) slightly increases the AUCs compared to the LR model. The model's performances were marginally low and without clinical relevance. The linear combination of summary-based statistics, specifically LR(MMM) and LR(MMIQR), exhibits similar performances as the classical LR model when biomarkers are linearly combined to increase diagnostic accuracy. Although the models perform on simulation data-sets, no clinical relevance of the combination is observed in the applied real-data.