JMIR Formative Research (Jun 2024)

Identifying X (Formerly Twitter) Posts Relevant to Dementia and COVID-19: Machine Learning Approach

  • Mehrnoosh Azizi,
  • Ali Akbar Jamali,
  • Raymond J Spiteri

DOI
https://doi.org/10.2196/49562
Journal volume & issue
Vol. 8
p. e49562

Abstract

Read online

BackgroundDuring the pandemic, patients with dementia were identified as a vulnerable population. X (formerly Twitter) became an important source of information for people seeking updates on COVID-19, and, therefore, identifying posts (formerly tweets) relevant to dementia can be an important support for patients with dementia and their caregivers. However, mining and coding relevant posts can be daunting due to the sheer volume and high percentage of irrelevant posts. ObjectiveThe objective of this study was to automate the identification of posts relevant to dementia and COVID-19 using natural language processing and machine learning (ML) algorithms. MethodsWe used a combination of natural language processing and ML algorithms with manually annotated posts to identify posts relevant to dementia and COVID-19. We used 3 data sets containing more than 100,000 posts and assessed the capability of various algorithms in correctly identifying relevant posts. ResultsOur results showed that (pretrained) transfer learning algorithms outperformed traditional ML algorithms in identifying posts relevant to dementia and COVID-19. Among the algorithms tested, the transfer learning algorithm A Lite Bidirectional Encoder Representations from Transformers (ALBERT) achieved an accuracy of 82.92% and an area under the curve of 83.53%. ALBERT substantially outperformed the other algorithms tested, further emphasizing the superior performance of transfer learning algorithms in the classification of posts. ConclusionsTransfer learning algorithms such as ALBERT are highly effective in identifying topic-specific posts, even when trained with limited or adjacent data, highlighting their superiority over other ML algorithms and applicability to other studies involving analysis of social media posts. Such an automated approach reduces the workload of manual coding of posts and facilitates their analysis for researchers and policy makers to support patients with dementia and their caregivers and other vulnerable populations.