Scientific Reports (Mar 2024)

Cluster analysis and visualisation of electronic health records data to identify undiagnosed patients with rare genetic diseases

  • Daniel Moynihan,
  • Sean Monaco,
  • Teck Wah Ting,
  • Kaavya Narasimhalu,
  • Jenny Hsieh,
  • Sylvia Kam,
  • Jiin Ying Lim,
  • Weng Khong Lim,
  • Sonia Davila,
  • Yasmin Bylstra,
  • Iswaree Devi Balakrishnan,
  • Mark Heng,
  • Elian Chia,
  • Khung Keong Yeo,
  • Bee Keow Goh,
  • Ritu Gupta,
  • Tele Tan,
  • Gareth Baynam,
  • Saumya Shekhar Jamuar

DOI
https://doi.org/10.1038/s41598-024-55424-8
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Rare genetic diseases affect 5–8% of the population but are often undiagnosed or misdiagnosed. Electronic health records (EHR) contain large amounts of data, which provide opportunities for analysing and mining. Data mining, in the form of cluster analysis and visualisation, was performed on a database containing deidentified health records of 1.28 million patients across 3 major hospitals in Singapore, in a bid to improve the diagnostic process for patients who are living with an undiagnosed rare disease, specifically focusing on Fabry Disease and Familial Hypercholesterolaemia (FH). On a baseline of 4 patients, we identified 2 additional patients with potential diagnosis of Fabry disease, suggesting a potential 50% increase in diagnosis. Similarly, we identified > 12,000 individuals who fulfil the clinical and laboratory criteria for FH but had not been diagnosed previously. This proof-of-concept study showed that it is possible to perform mining on EHR data albeit with some challenges and limitations.