IEEE Access (Jan 2022)

Join Classifier of Type and Index Mutation on Lung Cancer DNA Using Sequential Labeling Model

  • Untari Novia Wisesty,
  • Ayu Purwarianti,
  • Adi Pancoro,
  • Amrita Chattopadhyay,
  • Nam Nhut Phan,
  • Eric Y. Chuang,
  • Tati Rajab Mengko

DOI
https://doi.org/10.1109/ACCESS.2022.3142925
Journal volume & issue
Vol. 10
pp. 9004 – 9021

Abstract

Read online

The sequential labeling model is commonly used for time series or sequence data where each instance label is classified using previous instance label. In this work, a sequential labeling model is proposed as a new approach to detect the type and index mutations simultaneously, using DNA sequences from lung cancer study cases. The methods used are One Dimensional Convolutional Neural Network (1D-CNN), Bidirectional Long Short-Term Memory (BiLSTM), and Bidirectional Gated Recurrent Unit (Bi-GRU). Each nucleotide in the patient’s DNA sequence is classified as either normal or with a certain type of mutation in which case, its index mutation is predicted. The mutation types detected are either substitution, insertion, deletion, or delins (deletion insertion) mutations. Based on the experiments that were conducted using EGFR gene, BiLSTM and Bi-GRU displayed better performance and were more stable than 1D-CNN. Further tests were carried out on the TP53, KRAS, CTNNB1, SMARCA4, CDKN2A, PTPRD, BRAF, ERBB2, and PTPRT gene. The proposed model reports F1-scores of 0.9596, and 0.9612 using Bi-GRU and BiLSTM, respectively. Based on the results the model can successfully detect the type and index mutations in the DNA sequence more accurately and faster without the need for other supporting data and tools, and does not require re-alignment to reference sequences. This will greatly facilitate the user in detecting type and index mutations faster by entering only the DNA sequence.

Keywords