MalJPEG: Machine Learning Based Solution for the Detection of Malicious JPEG Images

Aviad Cohen; Nir Nissim; Yuval Elovici

doi:10.1109/ACCESS.2020.2969022

IEEE Access (Jan 2020)

MalJPEG: Machine Learning Based Solution for the Detection of Malicious JPEG Images

Aviad Cohen,
Nir Nissim,
Yuval Elovici

Affiliations

Aviad Cohen: ORCiD; Malware Lab, Cyber Security Research Center, Ben-Gurion University of the Negev, Be’er Sheva, Israel
Nir Nissim: ORCiD; Malware Lab, Cyber Security Research Center, Ben-Gurion University of the Negev, Be’er Sheva, Israel
Yuval Elovici: ORCiD; Department of Software and Information Engineering, Ben-Gurion University of the Negev, Be’er Sheva, Israel

DOI: https://doi.org/10.1109/ACCESS.2020.2969022
Journal volume & issue: Vol. 8
pp. 19997 – 20011

Abstract

Read online

In recent years, cyber-attacks against individuals, businesses, and organizations have increased. Cyber criminals are always looking for effective vectors to deliver malware to victims in order to launch an attack. Images are used on a daily basis by millions of people around the world, and most users consider images to be safe for use; however, some types of images can contain a malicious payload and perform harmful actions. JPEG is the most popular image format, primarily due to its lossy compression. It is used by almost everyone, from individuals to large organizations, and can be found on almost every device (on digital cameras and smartphones, websites, social media, etc.). Because of their harmless reputation, massive use, and high potential for misuse, JPEG images are used by cyber criminals as an attack vector. While machine learning methods have been shown to be effective at detecting known and unknown malware in various domains, to the best of our knowledge, machine learning methods have not been used particularly for the detection of malicious JPEG images. In this paper, we present MalJPEG, the first machine learning-based solution tailored specifically at the efficient detection of unknown malicious JPEG images. MalJPEG statically extracts 10 simple yet discriminative features from the JPEG file structure and leverages them with a machine learning classifier, in order to discriminate between benign and malicious JPEG images. We evaluated MalJPEG extensively on a real-world representative collection of 156,818 images which contains 155,013 (98.85%) benign and 1,805 (1.15%) malicious images. The results show that MalJPEG, when used with the LightGBM classifier, demonstrates the highest detection capabilities, with an area under the receiver operating characteristic curve (AUC) of 0.997, true positive rate (TPR) of 0.951, and a very low false positive rate (FPR) of 0.004.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords