Exploration of Immunology (Nov 2022)

Distributing human leukocyte antigen (HLA) database in histocompatibility: a shift in HLA data governance

  • Sirine Sayadi,
  • Venceslas Douillard,
  • Nicolas Vince,
  • Mario Südholt,
  • Pierre-Antoine Gourraud

DOI
https://doi.org/10.37349/ei.2022.00080
Journal volume & issue
Vol. 2, no. 6
pp. 749 – 759

Abstract

Read online

Aim: Human leukocyte antigen (HLA) population genetics has been a historical field centralizing data resource. HLA genetics databases typically facilitate access to frequencies of allele, haplotype, and genotype format information. Among many resources, the Allele Frequency Net Database (AFND) is a typical centralized repository that allows users to research and analyze immune gene frequencies in different populations around the world. With the massive increase in medical data and the strengthening of data governance laws, the proposal for a new distributed and secure model for the historical centralization method in population genetics has become important. In this paper, a new model of HLA population genetic resources, an alternative distributed version of HLA databases has been developed. It allows users to perform the same research and analysis with other remote sites without sharing their original data and monitoring data access. Methods: This new version uses the Master/Worker distributed model and offers distributed algorithms for the calculation of allelic frequencies, haplotypic frequencies and for individual genotypic calculations. The new model was evaluated on a distributed testbed for experiment-driven research Grid’5000 and has obtained good results of accuracy and execution time compared to the original centralized scheme used by researchers. Results: The results show that distributed algorithm applied to HLA population genetics resources enables usage control and enables enforcing the security framework of the data-owning institution. It gives the same results for all counting methods in population immunogenetics. With the same frequencies’ estimations, it yields a much quicker computation time in many cases, in particular for large samples. Conclusions: Distributing previously centralized resources is an interesting perspective enhancing better control of data sharing.

Keywords