APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups

Mohamad Erfan Mazaheri; Alireza Shameli-Sendi

doi:10.1109/ACCESS.2024.3473021

IEEE Access (Jan 2024)

APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups

Mohamad Erfan Mazaheri,
Alireza Shameli-Sendi

Affiliations

Mohamad Erfan Mazaheri: ORCiD; Faculty of Computer Science and Engineering, Shahid Beheshti University (SBU), Tehran 19839, Iran
Alireza Shameli-Sendi: ORCiD; Faculty of Computer Science and Engineering, Shahid Beheshti University (SBU), Tehran 19839, Iran

DOI: https://doi.org/10.1109/ACCESS.2024.3473021
Journal volume & issue: Vol. 12
pp. 145148 – 145158

Abstract

Read online

Malware poses a significant threat to organizations, necessitating robust countermeasures. One such measure involves attributing malware to its respective Advanced Persistent Threat (APT) group, which serves several purposes, two of the most important ones are: aiding in incident response and facilitating legal recourse. Recent years have witnessed a surge in research efforts aimed at refining methods for attributing malware to specific threat groups. These endeavors have leveraged a variety of machine learning and deep learning techniques, alongside diverse features extracted from malware binary files, to develop attribution systems. Despite these advancements, the field continues to beckon further investigation to enhance attribution methodologies. The basis of developing an effective attribution systems is to benefit from a rich dataset. Previous studies in this domain have meticulously detailed the process of model training and evaluation using distinct datasets, each characterized by unique strengths, weaknesses, and varying number of samples. In this paper, we scrutinize previous datasets from several perspectives while focusing on analyzing our dataset, which we claim is the most comprehensive in the realm of malware attribution. This dataset encompasses 64,440 malware samples attributed to 22 APT groups and spans a minimum of 40 malware families. The samples in the dataset span the years 2020 to 2024, and their developer APT groups originate from Russia, South Korea, China, USA, Nigeria, North Korea, Pakistan and Belarus. Its richness and breadth render it invaluable for future research endeavors in the field of malware attribution.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords