IEEE Access (Jan 2019)
AppAuth: Authorship Attribution for Android App Clones
Abstract
Android app clone detection has been extensively studied in our community, and a number of effective approaches and frameworks were proposed and released. However, there still remains one open challenge that has not been well addressed in previous work, i.e., the authorship attribution for the detected app clones. Although state-of-the-art approaches could accurately identify repackaged apps in one way or another, no convincing method has been proposed to identify the original app and the authentic author from the repackaged app pairs, which greatly limits the usage scenario of app clone detection techniques. For example, app market maintainers have to manually confirm the identified repackaged app pairs, while in most cases, it is challenging for them to make an accurate decision. In this paper, we propose AppAuth, a novel learning-based approach to predict the authorship of app clones. To be specific, for a given Android app clone pair (or a group of repackaged apps identified), AppAuth could accurately infer the original author of the plagiarized apps. Our approach is motivated by the traditional authorship attribution studies on binary files. AppAuth first extracts a number of coding-style-related features from the executable .apk files, and then relies on machine learning techniques to train a classification model. We have conducted extensive experiments to evaluate the effectiveness of AppAuth. The experiment results suggest that we are able to infer the authorship for Android app clones with high precision. Our work is the first one that tackles the problem systematically and we believe our efforts could positively contribute to the research community and boost the research of app repacking detection and authorship attribution studies.
Keywords