Journal of Medical Internet Research (Dec 2023)
Effects of Internal and External Factors on Hospital Data Breaches: Quantitative Study
Abstract
BackgroundHealth care data breaches are the most rapidly increasing type of cybercrime; however, the predictors of health care data breaches are uncertain. ObjectiveThis quantitative study aims to develop a predictive model to explain the number of hospital data breaches at the county level. MethodsThis study evaluated data consolidated at the county level from 1032 short-term acute care hospitals. We considered the association between data breach occurrence (a dichotomous variable), predictors based on county demographics, and socioeconomics, average hospital workload, facility type, and average performance on several hospital financial metrics using 3 model types: logistic regression, perceptron, and support vector machine. ResultsThe model coefficient performance metrics indicated convergent validity across the 3 model types for all variables except bad debt and the factor level accounting for counties with >20% and up to 40% Hispanic populations, both of which had mixed coefficient directionality. The support vector machine model performed the classification task best based on all metrics (accuracy, precision, recall, F1-score). All the 3 models performed the classification task well with directional congruence of weights. From the logistic regression model, the top 5 odds ratios (indicating a higher risk of breach) included inpatient workload, medical center status, pediatric trauma center status, accounts receivable, and the number of outpatient visits, in high to low order. The bottom 5 odds ratios (indicating the lowest odds of experiencing a data breach) occurred for counties with Black populations of >20% and 80% and 40% but <60%, as well as counties with ≤20% Asian or between 80% and 100% Hispanic individuals. Our results are in line with those of other studies that determined that patient workload, facility type, and financial outcomes were associated with the likelihood of health care data breach occurrence. ConclusionsThe results of this study provide a predictive model for health care data breaches that may guide health care managers to reduce the risk of data breaches by raising awareness of the risk factors.