Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification

Chen Mo; Jingjing Yin; Isaac Chun-Hai Fung; Zion Tsz Ho Tse

doi:10.3390/ejihpe11040109

European Journal of Investigation in Health, Psychology and Education (Nov 2021)

Aggregating Twitter Text through Generalized Linear Regression Models for Tweet Popularity Prediction and Automatic Topic Classification

Chen Mo,
Jingjing Yin,
Isaac Chun-Hai Fung,
Zion Tsz Ho Tse

Affiliations

Chen Mo: Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College Public Health, Georgia Southern University, Statesboro, GA 30458, USA
Jingjing Yin: Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College Public Health, Georgia Southern University, Statesboro, GA 30458, USA
Isaac Chun-Hai Fung: Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College Public Health, Georgia Southern University, Statesboro, GA 30458, USA
Zion Tsz Ho Tse: Department of Electronic Engineering, The University of York, Heslington, York YO10 5DD, UK

DOI: https://doi.org/10.3390/ejihpe11040109
Journal volume & issue: Vol. 11, no. 4
pp. 1537 – 1554

Abstract

Read online

Social media platforms have become accessible resources for health data analysis. However, the advanced computational techniques involved in big data text mining and analysis are challenging for public health data analysts to apply. This study proposes and explores the feasibility of a novel yet straightforward method by regressing the outcome of interest on the aggregated influence scores for association and/or classification analyses based on generalized linear models. The method reduces the document term matrix by transforming text data into a continuous summary score, thereby reducing the data dimension substantially and easing the data sparsity issue of the term matrix. To illustrate the proposed method in detailed steps, we used three Twitter datasets on various topics: autism spectrum disorder, influenza, and violence against women. We found that our results were generally consistent with the critical factors associated with the specific public health topic in the existing literature. The proposed method could also classify tweets into different topic groups appropriately with consistent performance compared with existing text mining methods for automatic classification based on tweet contents.

Published in European Journal of Investigation in Health, Psychology and Education

ISSN: 2174-8144 (Print); 2254-9625 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Medicine: Public aspects of medicine; Philosophy. Psychology. Religion: Psychology
Website: https://www.mdpi.com/journal/ejihpe

About the journal

Abstract

Keywords