PLoS ONE (Jan 2024)

Bioinformatics pipeline for the systematic mining genomic and proteomic variation linked to rare diseases: The example of monogenic diabetes.

  • Ksenia G Kuznetsova,
  • Jakub Vašíček,
  • Dafni Skiadopoulou,
  • Janne Molnes,
  • Miriam Udler,
  • Stefan Johansson,
  • Pål Rasmus Njølstad,
  • Alisa Manning,
  • Marc Vaudel

DOI
https://doi.org/10.1371/journal.pone.0300350
Journal volume & issue
Vol. 19, no. 4
p. e0300350

Abstract

Read online

Monogenic diabetes is characterized as a group of diseases caused by rare variants in single genes. Like for other rare diseases, multiple genes have been linked to monogenic diabetes with different measures of pathogenicity, but the information on the genes and variants is not unified among different resources, making it challenging to process them informatically. We have developed an automated pipeline for collecting and harmonizing data on genetic variants linked to monogenic diabetes. Furthermore, we have translated variant genetic sequences into protein sequences accounting for all protein isoforms and their variants. This allows researchers to consolidate information on variant genes and proteins linked to monogenic diabetes and facilitates their study using proteomics or structural biology. Our open and flexible implementation using Jupyter notebooks enables tailoring and modifying the pipeline and its application to other rare diseases.