dbPepVar: A Novel Cancer Proteogenomics Database

Lucas Marques Da Cunha; Patrick Terrematte; Tayna Da Silva Fiuza; Vandeclecio Lira Da Silva; Jose Eduardo Kroll; Sandro Jose De Souza; Gustavo Antonio De Souza

doi:10.1109/ACCESS.2022.3201897

IEEE Access (Jan 2022)

dbPepVar: A Novel Cancer Proteogenomics Database

Lucas Marques Da Cunha,
Patrick Terrematte,
Tayna Da Silva Fiuza,
Vandeclecio Lira Da Silva,
Jose Eduardo Kroll,
Sandro Jose De Souza,
Gustavo Antonio De Souza

Affiliations

Lucas Marques Da Cunha: ORCiD; Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte (UFRN), Natal, Brazil
Patrick Terrematte: Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte (UFRN), Natal, Brazil
Tayna Da Silva Fiuza: Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte (UFRN), Natal, Brazil
Vandeclecio Lira Da Silva: Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte (UFRN), Natal, Brazil
Jose Eduardo Kroll: Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte (UFRN), Natal, Brazil
Sandro Jose De Souza: Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte (UFRN), Natal, Brazil
Gustavo Antonio De Souza: Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte (UFRN), Natal, Brazil

DOI: https://doi.org/10.1109/ACCESS.2022.3201897
Journal volume & issue: Vol. 10
pp. 90982 – 90994

Abstract

Read online

Cancers arise from the acquisition of DNA mutations, such as substitutions, deletions, amplifications, and rearrangements. Understanding the distribution and correlation of such mutations in cancer may aid the characterization of the disease and subsequent identification of biomarkers for diagnosis and treatment. The proteogenomics database (dbPepVar) created here combines genetic variation information from dbSNP with protein sequences from NCBI’s RefSeq. Public mass spectrometry datasets (Ovarian, Colorectal, Breast, and Prostate) were used to perform a pan-cancer analysis, allowing the identification of unique genetic variations. As a result, 3,726 variant peptides were identified in samples from patients with ovarian cancer, 2,543 in prostate, 2,661 in breast and 2,411 in colon-rectal cancer patients. Data resulting from the proteogenomics approach employed and connected to other biological databases is now available in an intuitive and dynamic web portal where novice users can explore general aspects of the dataset in graph or table format, or dive in to filter the data with click and select options or using more advanced queries with regex. All data can be downloaded in csv or pdf format. In perspective, the web portal developed may direct studies to identify new therapeutic targets for different cancers, and one can also use our database for characterization of variants in samples of unknown genetic background, such as archived samples.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords