Cogent Engineering (Dec 2024)
Behavioural user segmentation of app users based on functionality interaction patterns
Abstract
User segmentation categorises a large and complex user base into manageable similar groups of users. Existing works encounter challenges when dealing with a sparse dataset and finding insights from the generated clusters. This study has two objectives: (1) to identify an optimal clustering model that can handle a sparse dataset and (2) to extract post-clustering insights via a descriptive persona for each cluster. This study deployed clustering models to handle a behavioural user-interaction dataset with a sparsity rate of 85%. The findings revealed that Density-Based Spatial Clustering of Applications with Noise that leveraged on One-hot Encoding and data representation learning via an autoencoder performed best, with a Silhouette score of 0.36. Subsequently, this study enacted techniques and tools such as classification, SHapley Additive exPlanation value, and manual analysis. Classification and SHAP values were used to identify important features that can differentiate clusters created by different clustering models. Specifically, a linear SHAP explainer object was applied to Logistic Regression had been identified to outperformed Random Forest and Light Gradient Boosting Machine, with an accuracy of 97%. A manual analysis of the central tendencies of these relatively more important features within each cluster was performed to create a descriptive persona. The findings revealed four distinctive personas, namely the “Active User,” “COVID-19 Preventer,” “Inactive User,” and “Average Joe.”
Keywords