Makara Journal of Science (Jun 2020)

Protein Annotation of Breast-cancer-related Proteins with Machine-learning Tools

  • Arli Aditya Parikesit ,
  • David Agustriawan,
  • Rizky Nurdiansyah

DOI
https://doi.org/10.7454/mss.v24i2.12106
Journal volume & issue
Vol. 24, no. 2
pp. 101 – 111

Abstract

Read online

One of the primary contributors to the mortality of women is breast cancer. Several approaches are used to cure it, but recurrence occurs in 79% of the cases because the underlying mechanism of the protein molecules is not carefully ex-amined. The goal of this research was to use machine-learning tools is to elucidate conserved regions and to obtain functional annotations of breast-cancer-related proteins. The sequences of five breast-cancer-related proteins (BRCA2, BCAR1, BCAR3, BCAR4, and BRMS1) and their annotations were retrieved from the UniProt and TCGA databases, respectively. Conserved regions were extracted using CLUSTALX. We constructed a phylogenetic tree using the MEGA 7.0. SUPERFAMILY database to obtain fine-grained domain annotation. The tree revealed that the BRCA2 and BCAR4 protein sequences are located in a clade, which indicates that they have overlapping functions. Several protein domains were identified, including the SH2 and Ras GEF domains in BCAR3, the SH3 domain in BCAR1, and the BRCA2 helical domain, the nucleic-acid-binding protein, and tower domain. We found that no protein domains could be annotated for BCAR4 or BRMS1, which may indicate the presence of a disordered protein state. We suggest that each protein has distinct functionalities that are complementary in regulating the progression of breast cancer, although further study is necessary for confirmation. This protein-domain annotation project could be leveraged by the complete integration of mapping with respect to gene and disease ontology. This type of leverage is vital for obtaining biochemical insights regarding breast cancer.

Keywords