PLoS ONE (Jan 2022)

Data integration of National Dose Registry and survey data using multivariate imputation by chained equations

  • Ryu Kyung Kim,
  • Young Min Kim,
  • Won Jin Lee,
  • Jongho Im,
  • Juhee Lee,
  • Ye Jin Bang,
  • Eun Shil Cha

Journal volume & issue
Vol. 17, no. 6

Abstract

Read online

Introduction Data integration is the process of merging information from multiple datasets generated from different sources, which can obtain more information in comparison to to one data source. All diagnostic medical radiation workers were enrolled in National Dose Registry (NDR) from 1996 to 2011, linked with mortality and cancer registry data. (https://kdca.go.kr/) Survey was conducted during 2012-2013 using self-reported questionnaire on occupational radiation practices among diagnostic medical radiation workers. Methods Data integration of NDR and Survey was performed using the multivariate imputation by chained equations (MICE) algorithm. Results The results were compared by sex and type of job because characteristics of target variables for imputation depend on these variables. There was a difference between the observed and pooled mean for the frequency of interventional therapy for nurses due to different type of medical facility distribution between observed and completed data. Concerning the marital status of males and females, and status of pregnancy for females, there was a difference between observed and pooled mean because the distribution of the year of birth was different between the observed and completed data. For lifetime status of smoking, the percentage of smoking experience was higher in the completed data than in the observed data, which could be due to reasons, such as underreporting among females and the distribution difference in the frequency of drinking between the observed and completed data for males. Conclusion Data integration can allow us to obtain survey information of NDR units without additional surveys, saving us time and costs for the survey.