IEEE Access (Jan 2024)

Enhancing Health Mention Classification Through Reexamining Misclassified Samples and Robust Fine-Tuning Pre-Trained Language Models

  • Deyu Meng,
  • Tshewang Phuntsho,
  • Tad Gonsalves

DOI
https://doi.org/10.1109/ACCESS.2024.3510388
Journal volume & issue
Vol. 12
pp. 190445 – 190453

Abstract

Read online

In Public health surveillance (PHS), accurately identifying health mentions on social media is crucial for detecting health trends and outbreaks early. Health mention classification (HMC) can identify health-related content in social media, thereby predicting the health status of users. Nevertheless, traditional approaches face challenges due to noise in keyword-based data collection, affecting the precision of HMC. To address this challenge, our research introduces an innovative method that enhances the accuracy and robustness of pre-trained language models for HMC by employing a misclassified samples replay buffer and applying controlled perturbations to data representations. This approach allows for continuous learning from errors. It improves the model’s ability to distinguish subtle semantic differences, significantly outperforming existing state-of-the-art and baseline models across three HMC datasets. Our findings demonstrate the method’s effectiveness in improving health mention detection and contribute to the field of explainable AI, offering insights into the decision-making process of models. This work promises to bolster the use of social media as a reliable tool for PHS, facilitating more proactive and informed public health responses.

Keywords