Scientific Reports (Apr 2023)
Using machine learning to estimate the incidence rate of intimate partner violence
Abstract
Abstract It is difficult to accurately estimate the incidence rate of intimate partner violence (IPV) using traditional social survey methods because IPV victims are often reluctant to disclose their experiences, leading to an underestimation of the incidence rate. To address this issue, we applied machine learning algorithms to predict the incidence rate of IPV in China based on data from the Third Wave Survey on the Social Status of Women in China (TWSSSCW 2010). Specifically, we examined five unbalanced sample-processing methods and six machine learning algorithms, choosing the random under-sampling ensemble method and the random forest algorithm to impute the missing data. Analysis of the complete data showed that the incidence rates of physical violence, verbal violence, and cold violence were 7.10%, 13.74%, and 21.35%, respectively, which were higher than the incidence rates in the original dataset (4.05%, 11.21%, and 17.95%, respectively). The robustness of our findings was further confirmed by analysis using different training sets. Overall, this study demonstrates that better tools need to be developed to accurately estimate the incidence rates of IPV. It also serves as a useful guide for future research that imputes missing data using machine learning.