Journal of Statistics and Data Science Education (Jan 2024)

What is Missing in Missing Data Handling? An Evaluation of Missingness in and Potential Remedies for Doctoral Dissertations and Subsequent Publications that Use NHANES Data

  • Hairui Yu,
  • Suzanne E. Perumean-Chaney,
  • Kathryn A. Kaiser

DOI
https://doi.org/10.1080/26939169.2023.2177214
Journal volume & issue
Vol. 32, no. 1
pp. 3 – 10

Abstract

Read online

AbstractMissing data can significantly influence results of epidemiological studies. The National Health and Nutrition Examination Survey (NHANES) is a popular epidemiological dataset. We examined recent practices related to the prevalence and the reporting of the amount of missing data, the underlying mechanisms, and the methods used for handling missing data in recent doctoral dissertations and subsequent publications using NHANES data as a case study. We also explored missing data handling coverage in top-selling applied statistical textbooks. Thirty-seven doctoral dissertations (published from 2007 to 2017) and 17 subsequent journal articles were included in the analysis. Overall, 29 (78.4%) dissertations did not explicitly state whether they had missing data. Five (62.5%) dissertations reporting missing data did not report an assumed mechanism of missingness. Only one subsequent journal article reported the missing data percent for key variables. 28 (75.7%) dissertations and 16 (94.1%) journal articles reported the use of NHANES sample weights. Of the top-selling 20 applied statistics/biostatistics textbooks examined, 14 did not mention imputation. This sample reflects poor rigor in analysis, reporting, and handling of missing data found among recent graduates and poor coverage in textbooks. Checklist utilization and improvement in statistical training on missing data handling are needed. Supplementary materials for this article are available online.

Keywords