Life (May 2022)

TCGA-My: A Systematic Repository for Systems Biology of Malaysian Colorectal Cancer

  • Mohd Amin Azuwar,
  • Nor Azlan Nor Muhammad,
  • Nor Afiqah-Aleng,
  • Nurul-Syakima Ab Mutalib,
  • Najwa Farhah Md. Yusof,
  • Ryia Illani Mohd Yunos,
  • Muhiddin Ishak,
  • Sazuita Saidin,
  • Isa Mohamed Rose,
  • Ismail Sagap,
  • Luqman Mazlan,
  • Zairul Azwan Mohd Azman,
  • Musalmah Mazlan,
  • Sharaniza Ab Rahim,
  • Wan Zurinah Wan Ngah,
  • Sheila Nathan,
  • Nurul Azmir Amir Hashim,
  • Zeti-Azura Mohamed-Hussein,
  • Rahman Jamal

DOI
https://doi.org/10.3390/life12060772
Journal volume & issue
Vol. 12, no. 6
p. 772

Abstract

Read online

Colorectal cancer (CRC) ranks second among the most commonly occurring cancers in Malaysia, and unfortunately, its pathobiology remains unknown. CRC pathobiology can be understood in detail with the implementation of omics technology that is able to generate vast amounts of molecular data. The generation of omics data has introduced a new challenge for data organization. Therefore, a knowledge-based repository, namely TCGA-My, was developed to systematically store and organize CRC omics data for Malaysian patients. TCGA-My stores the genome and metabolome of Malaysian CRC patients. The genome and metabolome datasets were organized using a Python module, pandas. The variants and metabolites were first annotated with their biological information using gene ontologies (GOs) vocabulary. The TCGA-My relational database was then built using HeidiSQL PorTable 9.4.0.512, and Laravel was used to design the web interface. Currently, TCGA-My stores 1,517,841 variants, 23,695 genes, and 167,451 metabolites from the samples of 50 CRC patients. Data entries can be accessed via search and browse menus. TCGA-My aims to offer effective and systematic omics data management, allowing it to become the main resource for Malaysian CRC research, particularly in the context of biomarker identification for precision medicine.

Keywords