Journal of Medical Internet Research (Aug 2020)

Topic Modeling of Social Networking Service Data on Occupational Accidents in Korea: Latent Dirichlet Allocation Analysis

  • Min, Kyoung-Bok,
  • Song, Sung-Hee,
  • Min, Jin-Young

DOI
https://doi.org/10.2196/19222
Journal volume & issue
Vol. 22, no. 8
p. e19222

Abstract

Read online

BackgroundIn most industrialized societies, regulations, inspections, insurance, and legal options are established to support workers who suffer injury, disease, or death in relation to their work; in practice, these resources are imperfect or even unavailable due to workplace or employer obstruction. Thus, limitations exist to identify unmet needs in occupational safety and health information. ObjectiveThe aim of this study was to explore hidden issues related to occupational accidents by examining social network services (SNS) data using topic modeling. MethodsBased on the results of a Google search for the phrases occupational accident, industrial accident and occupational diseases, a total of 145 websites were selected. From among these websites, we collected 15,244 documents on queries related to occupational accidents between 2002 and 2018. To transform unstructured text into structure data, natural language processing of the Korean language was conducted. We performed the latent Dirichlet allocation (LDA) as a topic model using a Python library. A time-series linear regression analysis was also conducted to identify yearly trends for the given documents. ResultsThe results of the LDA model showed 14 topics with 3 themes: workers’ compensation benefits (Theme 1), illicit agreements with the employer (Theme 2), and fatal and non-fatal injuries and vulnerable workers (Theme 3). Theme 1 represented the largest cluster (52.2%) of the collected documents and included keywords related to workers’ compensation (ie, company, occupational injury, insurance, accident, approval, and compensation) and keywords describing specific compensation benefits such as medical expense benefits, temporary incapacity benefits, and disability benefits. In the yearly trend, Theme 1 gradually decreased; however, other themes showed an overall increasing pattern. Certain queries (ie, musculoskeletal system, critical care, and foreign workers) showed no significant variation in the number of queries. ConclusionsWe conducted LDA analysis of SNS data of occupational accident–related queries and discovered that the primary concerns of workers posting about occupational injuries and diseases were workers’ compensation benefits, fatal and non-fatal injuries, vulnerable workers, and illicit agreements with employers. While traditional systems focus mainly on quantitative monitoring of occupational accidents, qualitative aspects formulated by topic modeling from unstructured SNS queries may be valuable to address inequalities and improve occupational health and safety.