PLoS ONE (Jan 2023)

Imputation of missing values for cochlear implant candidate audiometric data and potential applications.

  • Cole Pavelchek,
  • Andrew P Michelson,
  • Amit Walia,
  • Amanda Ortmann,
  • Jacques Herzog,
  • Craig A Buchman,
  • Matthew A Shew

DOI
https://doi.org/10.1371/journal.pone.0281337
Journal volume & issue
Vol. 18, no. 2
p. e0281337

Abstract

Read online

ObjectiveAssess the real-world performance of popular imputation algorithms on cochlear implant (CI) candidate audiometric data.Methods7,451 audiograms from patients undergoing CI candidacy evaluation were pooled from 32 institutions with complete case analysis yielding 1,304 audiograms. Imputation model performance was assessed with nested cross-validation on randomly generated sparse datasets with various amounts of missing data, distributions of sparsity, and dataset sizes. A threshold for safe imputation was defined as root mean square error (RMSE) ResultsGreater quantities of missing data were associated with worse performance. Sparsity in audiometric data is not uniformly distributed, as inter-octave frequencies are less commonly tested. With 3-8 missing features per instance, a real-world sparsity distribution was associated with significantly better performance compared to other sparsity distributions (Δ RMSE 0.3 dB- 5.8 dB, non-overlapping 99% confidence intervals). With a real-world sparsity distribution, models were able to safely impute up to 6 missing datapoints in an 11-frequency audiogram. MICE consistently outperformed other models across all metrics and sparsity distributions (p ConclusionPrecision medicine will inevitably play an integral role in the future of hearing healthcare. These methods are data dependent, and rigorously validated imputation models are a key tool for maximizing datasets. Using the largest CI audiogram dataset to-date, we demonstrate that in a real-world scenario MICE can safely impute missing data for the vast majority (>99%) of audiograms with RMSE well below a clinically significant threshold of 10dB. Evaluation across a range of dataset sizes and sparsity distributions suggests a high degree of generalizability to future applications.