International Journal of Computational Intelligence Systems (Jan 2018)
EGFR Microdeletion Mutations Analysis System Model Using Parameters Combinations Generator for Design of RADBAS Neural Network Knowledge Based Identifier
Abstract
The aim of this research is to automate an analysis of the EGFR gene as a whole, and especially an analysis of those exons with clinically identified microdeletion mutations which are recorded with non-mutated nucleotides in a long chains of a, c, t, g nucleotides, and “-“ (microdeletion) in the NCBI database or other sites. In addition, the developed system can analyze data resulting from EGFR gene DNA sequencing or DNA extraction for a new patient and identify regions potential microdeletion mutations that clinicians need to develop new treatments. Classifiers, trained using limited set of known mutated samples, are not capable of exact identification of mutations and their distribution within the sample, especially for previously unknown mutations. Consequently, results obtained by classification, are not reliable to select therapy in personalized medicine. Personalized medicine demands exact therapy, which can be designated only if all combinations of EGFR gene exon mutations are known. We propose computing system/model based on two modules: The first module includes training of knowledge based radial basis (RADBAS) neural network using training set generated with combinatorial microdeletion mutations generator. The second module has two modes of operation: the first mode is offline simulation including testing of the RADBAS neural network with randomly generated microdeletion mutations on exons 18th, 19th, and 20th; and the second mode is intended for application in real time using sample patients’ data with microdeletion mutations extracted online from EGFR mutation database. Both modes include preprocessing of data (extraction, encoding, and masking), identification of distributed mutations (RBNN encoding, counting of exon mutations distribution and counting of EGFR gene mutation distribution), and standard reporting. The system has been implemented in MATLAB/SIMULINK environment.
Keywords