Journal of Big Data (Feb 2021)

Annotating and detecting topics in social media forum and modelling the annotation to derive directions-a case study

  • B. Athira,
  • Josette Jones,
  • Sumam Mary Idicula,
  • Anand Kulanthaivel,
  • Enming Zhang

DOI
https://doi.org/10.1186/s40537-021-00429-7
Journal volume & issue
Vol. 8, no. 1
pp. 1 – 23

Abstract

Read online

Abstract The widespread influence of social media impacts every aspect of life, including the healthcare sector. Although medics and health professionals are the final decision makers, the advice and recommendations obtained from fellow patients are significant. In this context, the present paper explores the topics of discussion posted by breast cancer patients and survivors on online forums. The study examines an online forum, Breastcancer.org, maps the discussion entries to several topics, and proposes a machine learning model based on a classification algorithm to characterize the topics. To explore the topics of breast cancer patients and survivors, approximately 1000 posts are selected and manually labeled with annotations. In contrast, millions of posts are available to build the labels. A semi-supervised learning technique is used to build the labels for the unlabeled data; hence, the large data are classified using a deep learning algorithm. The deep learning algorithm BiLSTM with BERT word embedding technique provided a better f1-score of 79.5%. This method is able to classify the following topics: medication reviews, clinician knowledge, various treatment options, seeking and providing support, diagnostic procedures, financial issues and implications for everyday life. What matters the most for the patients is coping with everyday living as well as seeking and providing emotional and informational support. The approach and findings show the potential of studying social media to provide insight into patients' experiences with cancer like critical health problems.

Keywords