Informatics in Medicine Unlocked (Jan 2021)

Prediction of secondary testosterone deficiency using machine learning: A comparative analysis of ensemble and base classifiers, probability calibration, and sampling strategies in a slightly imbalanced dataset

  • Monique Tonani Novaes,
  • Osmar Luiz Ferreira de Carvalho,
  • Pedro Henrique Guimarães Ferreira,
  • Taciana Leonel Nunes Tiraboschi,
  • Caroline Santos Silva,
  • Jean Carlos Zambrano,
  • Cristiano Mendes Gomes,
  • Eduardo de Paula Miranda,
  • Osmar Abílio de Carvalho Júnior,
  • José de Bessa Júnior

Journal volume & issue
Vol. 23
p. 100538

Abstract

Read online

Testosterone is the most important male sex hormone, and its deficiency brings many physical and mental harms. Efficiently identifying individuals with low testosterone is crucial prior to starting proper treatment. However, routine monitoring of testosterone levels can be costly in many regions, resulting in an underreporting of cases, especially in developing countries. Moreover, there are few studies that employ machine learning (ML) in prognosticating testosterone deficiency. This research, therefore, aims to offer a coherent comparative analysis of machine learning methods that can predict testosterone deficiency without having patients undergo costly medical tests. In doing so, we seek to provide to the urological community a publicly available dataset (https://github.com/osmarluiz/Testosterone-Deficiency-Dataset) to increase research in this yet untapped field. For this analysis, we used ten base classifiers (optimized with grid search stratified K-fold cross-validation); three ensemble methods; and eight sampling strategies to analyze a total of 3397 patients. The analysis was based on six features (age; abdominal circumference; triglycerides; high-density lipoprotein; diabetes; and hypertension), all of which were obtained by low-cost exams. We compared the sampling strategies and the classifiers' performance on an independent test set using ranking (PR-AUC), probabilistic (Brier score), and threshold metrics. We found that: (1) within the ranking metrics, sampling strategies did not enhance results in this slightly imbalanced (4:1 ratio) dataset; (2) the ensemble classifier using weighted average presented the best performance; (3) the best base classifier was XGBoost; (4) calibration showed significant improvement for the sampling strategies and slight improvements for the no sampling strategy; (5) the McNemar's test presented statistically similar results among all classifiers; and (6) abdominal circumference (AC) had by far the highest feature importance, followed by triglycerides (TG). Age showed very little significance in predicting testosterone deficiency.

Keywords