Quality assurance of integrative big data for medical research within a multihospital system

Yi-Chia Lee; Ying-Ting Chao; Pei-Ju Lin; Yen-Yun Yang; Yu-Cih Yang; Cheng-Chieh Chu; Yu-Chun Wang; Chin-Hao Chang; Shu-Lin Chuang; Wei-Chun Chen; Hsing-Jen Sun; Hsin-Cheng Tsou; Cheng-Fu Chou; Wei-Shiung Yang

Journal of the Formosan Medical Association (Sep 2022)

Quality assurance of integrative big data for medical research within a multihospital system

Yi-Chia Lee,
Ying-Ting Chao,
Pei-Ju Lin,
Yen-Yun Yang,
Yu-Cih Yang,
Cheng-Chieh Chu,
Yu-Chun Wang,
Chin-Hao Chang,
Shu-Lin Chuang,
Wei-Chun Chen,
Hsing-Jen Sun,
Hsin-Cheng Tsou,
Cheng-Fu Chou,
Wei-Shiung Yang

Affiliations

Yi-Chia Lee: Integrative Medical Database Center, Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan; Department of Internal Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan; Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
Ying-Ting Chao: Integrative Medical Database Center, Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan; Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
Pei-Ju Lin: Integrative Medical Database Center, Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan
Yen-Yun Yang: Integrative Medical Database Center, Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan
Yu-Cih Yang: Integrative Medical Database Center, Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan
Cheng-Chieh Chu: Integrative Medical Database Center, Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan
Yu-Chun Wang: Integrative Medical Database Center, Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan
Chin-Hao Chang: Integrative Medical Database Center, Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan
Shu-Lin Chuang: Integrative Medical Database Center, Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan; Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
Wei-Chun Chen: Information Technology Office, National Taiwan University Hospital, Taipei, Taiwan
Hsing-Jen Sun: Information Technology Office, National Taiwan University Hospital, Taipei, Taiwan
Hsin-Cheng Tsou: Integrative Medical Database Center, Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan; Clinical Trial Center, National Taiwan University Hospital, Taipei, Taiwan
Cheng-Fu Chou: Information Technology Office, National Taiwan University Hospital, Taipei, Taiwan; Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Wei-Shiung Yang: Integrative Medical Database Center, Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan; Department of Internal Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan; Graduate Institute of Clinical Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan; Corresponding author. Department of Medical Research, National Taiwan University Hospital, No. 7, Chung-Shan South Road, Taipei, 10002, Taiwan. Fax: +886 2 23709820.

Journal volume & issue: Vol. 121, no. 9
pp. 1728 – 1738

Abstract

Read online

Background: The need is growing to create medical big data based on the electronic health records collected from different hospitals. Errors for sure occur and how to correct them should be explored. Methods: Electronic health records of 9,197,817 patients and 53,081,148 visits, totaling about 500 million records for 2006–2016, were transmitted from eight hospitals into an integrated database. We randomly selected 10% of patients, accumulated the primary keys for their tabulated data, and compared the key numbers in the transmitted data with those of the raw data. Errors were identified based on statistical testing and clinical reasoning. Results: Data were recorded in 1573 tables. Among these, 58 (3.7%) had different key numbers, with the maximum of 16.34/1000. Statistical differences (P < 0.05) were found in 34 (58.6%), of which 15 were caused by changes in diagnostic codes, wrong accounts, or modified orders. For the rest, the differences were related to accumulation of hospital visits over time. In the remaining 24 tables (41.4%) without significant differences, three were revised because of incorrect computer programming or wrong accounts. For the rest, the programming was correct and absolute differences were negligible. The applicability was confirmed using the data of 2,730,883 patients and 15,647,468 patient-visits transmitted during 2017–2018, in which 10 (3.5%) tables were corrected. Conclusion: Significant magnitude of inconsistent data does exist during the transmission of big data from diverse sources. Systematic validation is essential. Comparing the number of data tabulated using the primary keys allow us to rapidly identify and correct these scattered errors.

Published in Journal of the Formosan Medical Association

ISSN: 0929-6646 (Print)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Medicine: Medicine (General)
Website: http://www.journals.elsevier.com/journal-of-the-formosan-medical-association/

About the journal

Abstract

Keywords