IEEE Access (Jan 2019)
Robust Intelligent Malware Detection Using Deep Learning
Abstract
Security breaches due to attacks by malicious software (malware) continue to escalate posing a major security concern in this digital age. With many computer users, corporations, and governments affected due to an exponential growth in malware attacks, malware detection continues to be a hot research topic. Current malware detection solutions that adopt the static and dynamic analysis of malware signatures and behavior patterns are time consuming and have proven to be ineffective in identifying unknown malwares in real-time. Recent malwares use polymorphic, metamorphic, and other evasive techniques to change the malware behaviors quickly and to generate a large number of new malwares. Such new malwares are predominantly variants of existing malwares, and machine learning algorithms (MLAs) are being employed recently to conduct an effective malware analysis. However, such approaches are time consuming as they require extensive feature engineering, feature learning, and feature representation. By using the advanced MLAs such as deep learning, the feature engineering phase can be completely avoided. Recently reported research studies in this direction show the performance of their algorithms with a biased training data, which limits their practical use in real-time situations. There is a compelling need to mitigate bias and evaluate these methods independently in order to arrive at a new enhanced method for effective zero-day malware detection. To fill the gap in the literature, this paper, first, evaluates the classical MLAs and deep learning architectures for malware detection, classification, and categorization using different public and private datasets. Second, we remove all the dataset bias removed in the experimental analysis by having different splits of the public and private datasets to train and test the model in a disjoint way using different timescales. Third, our major contribution is in proposing a novel image processing technique with optimal parameters for MLAs and deep learning architectures to arrive at an effective zero-day malware detection model. A comprehensive comparative study of our model demonstrates that our proposed deep learning architectures outperform classical MLAs. Our novelty in combining visualization and deep learning architectures for static, dynamic, and image processing-based hybrid approach applied in a big data environment is the first of its kind toward achieving robust intelligent zero-day malware detection. Overall, this paper paves way for an effective visual detection of malware using a scalable and hybrid deep learning framework for real-time deployments.
Keywords