Bioinformatics and Biology Insights (Sep 2022)

Computational Analyses Reveal Fundamental Properties of the Hemophilia Literature in the Last 6 Decades

  • Tiago JS Lopes,
  • Ricardo Rios,
  • Tatiane Nogueira

DOI
https://doi.org/10.1177/11779322221125604
Journal volume & issue
Vol. 16

Abstract

Read online

Hemophilia is an inherited blood coagulation disorder caused by mutations on the coagulation factors VIII or IX genes. Although it is a relatively rare disease, the research community is actively working on this topic, producing almost 6000 manuscripts in the last 5 years. Given that the scientific literature is increasing so rapidly, even the most avid reader will find it difficult to follow it closely. In this study, we used sophisticated computational techniques to map the hemophilia literature of the last 60 years. We created a network structure to represent authorship collaborations, where the nodes are the researchers and 2 nodes are connected if they co-authored a manuscript. We accurately identified author clusters, namely, researchers who have collaborated systematically for several years, and used text mining techniques to automatically synthesize their research specialties. Overall, this study serves as a historical appreciation of the effort of thousands of hemophilia researchers and demonstrates that a computational framework is able to automatically identify collaboration networks and their research specialties. Importantly, we made all datasets and source code available for the community, and we anticipate that the methods introduced here will pave the way for the development of systems that generate compelling hypothesis based on patterns that are imperceptible to human researchers.