IEEE Access (Jan 2021)
DNA Encoding and STR Extraction for Anomaly Intrusion Detection Systems
Abstract
Deoxyribonucleic acid (DNA) can be used to discover the presence of diseases in the human body. Similarly, its functionality can be leveraged in an intrusion detection system (IDS) to detect attacks against computer systems and network traffic. Various approaches have been proposed for using DNA sequences in IDSs. The most popular is the DNA sequence matching method, which is also used in biology. A technique that uses the DNA sequence in an IDS has previously been proposed to generate a normal signature sequence with an alignment threshold value. However, its detection rate is very low. Therefore, this paper considers the two main factors that affect the detection accuracy via the DNA sequence, DNA encoding and the short tandem repeat (STR) (i.e., the DNA keys and their positions). It then proposes two DNA encoding methods, named DEM3sel, and DEMdif, which differ in terms of the length of the DNA sequence and the network traffic representation. DEM3sel uses three characters to represent all 41 network attributes but uses a single fixed character to distinguish between nominal and numerical attributes. DEMdif uses different characters to represent all the network attributes based on the attribute values and uses a single fixed character to distinguish between nominal and numerical attributes. In all these methods, the Teiresias algorithm is used to extract the short tandem repeat (STR), which includes both the keys and their positions in the network traffic, while the Knuth-Morris-Pratt algorithm is used as a matching process to determine whether the network traffic is normal or an attack. An experiment is conducted to assess the proposed methods' performance on two standard datasets: KDDCup 99 and NSL-KDD. The experiment is run 30 times for each DNA encoding method. The results show that DEM3sel obtains the best result compared with DEMdif, where the detection rate, false alarm rate, and accuracy of detection are 99.58%, 35.53%, and 92.74% respectively. The results also show that using more keys and their positions improves the false alarm rate and the accuracy of DEM3sel by up to 26.48% and 1.75%, respectively. Moreover, the performance of the proposed method DEM3sel is comparable to or better than state-of-the-art algorithms. Thus, it can be concluded that the proposed DNA sequence method is suitable for use in an IDS.
Keywords