Systems and Soft Computing (Dec 2022)
Early detection of violating Mobile Apps: A data-driven predictive model approach
Abstract
Mobile app stores are the key distributors of mobile applications. They regularly apply vetting processes to the deployed apps. Yet, some of these vetting processes might be inadequate or applied late. The late removal of applications might have unpleasant consequences for developers and users alike. Thus, in this work, we propose a data-driven predictive approach that determines whether the respective app will be removed or accepted. It also indicates the features’ relevance that helps the stakeholders in the interpretation. In turn, our approach can support developers in improving their apps and users in downloading the ones that are less likely to be removed. We focus on the Google App store and we compile a new data set of 870,515 applications, 56% of which have been removed from the market. Our proposed approach is a bootstrap aggregating of multiple XGBoost machine learning classifiers. We propose two models: user-centered using 47 features, and developer-centered using 37 features, which are available before publishing an app. We achieve the following Areas Under the ROC Curves (AUCs) on the test set: user-centered = 0.792, developer-centered = 0.762.