IEEE Access (Jan 2024)
APTracker: A Comprehensive and Analytical Malware Dataset Based on Attribution to APT Groups
Abstract
Malware poses a significant threat to organizations, necessitating robust countermeasures. One such measure involves attributing malware to its respective Advanced Persistent Threat (APT) group, which serves several purposes, two of the most important ones are: aiding in incident response and facilitating legal recourse. Recent years have witnessed a surge in research efforts aimed at refining methods for attributing malware to specific threat groups. These endeavors have leveraged a variety of machine learning and deep learning techniques, alongside diverse features extracted from malware binary files, to develop attribution systems. Despite these advancements, the field continues to beckon further investigation to enhance attribution methodologies. The basis of developing an effective attribution systems is to benefit from a rich dataset. Previous studies in this domain have meticulously detailed the process of model training and evaluation using distinct datasets, each characterized by unique strengths, weaknesses, and varying number of samples. In this paper, we scrutinize previous datasets from several perspectives while focusing on analyzing our dataset, which we claim is the most comprehensive in the realm of malware attribution. This dataset encompasses 64,440 malware samples attributed to 22 APT groups and spans a minimum of 40 malware families. The samples in the dataset span the years 2020 to 2024, and their developer APT groups originate from Russia, South Korea, China, USA, Nigeria, North Korea, Pakistan and Belarus. Its richness and breadth render it invaluable for future research endeavors in the field of malware attribution.
Keywords