A Study on the Automatic Classification of Tweets Related to Mental Health Literacy 心理健康素養推文自動分類之研究

Wei-Hung Tseng; Yin-Ju Lien; Chao-Hui Chen; Jiann-Cherng Shieh; Yuen-Hsien Tseng

doi:10.6120/JoEMLS.202403_61(1).0004.RS.CM

Jiàoyù zīliào yǔ túshūguǎn xué (Mar 2024)

A Study on the Automatic Classification of Tweets Related to Mental Health Literacy 心理健康素養推文自動分類之研究

Wei-Hung Tseng,
Yin-Ju Lien,
Chao-Hui Chen,
Jiann-Cherng Shieh,
Yuen-Hsien Tseng

Affiliations

Wei-Hung Tseng: ORCiD; Graduate Institute of Library & Information Studies, National Taiwan Normal University, Taipei, Taiwan
Yin-Ju Lien: ORCiD; Department of Health Promotion and Health Education, National Taiwan Normal University, Taipei, Taiwan
Chao-Hui Chen: ORCiD; Department of Health Promotion and Health Education, National Taiwan Normal University, Taipei, Taiwan
Jiann-Cherng Shieh: ORCiD; Graduate Institute of Library & Information Studies, National Taiwan Normal University, Taipei, Taiwan
Yuen-Hsien Tseng: ORCiD; Graduate Institute of Library & Information Studies, National Taiwan Normal University, Taipei, Taiwan

DOI: https://doi.org/10.6120/JoEMLS.202403_61(1).0004.RS.CM
Journal volume & issue: Vol. 61, no. 1
pp. 5 – 27

Abstract

Read online

推特上不乏使用者貼出描述心情的各式推文，分析這些推文，可協助瞭解個體的心理狀態，對促進大眾心理健康的研究將有所助益。本研究擬對推文中關於心理健康素養方面的簡短文本，進行自動分類。使用包括傳統機器學習以及BERT、SetFit、GPT-3、 GPT-4 等人工智慧的技術，將其自動分類到五個面向中的11個題項，每個題項都有五個相關強度分數。期望在有限的人工標記的訓練資料下，機器預測的成效要到0.8以上，達到機器有效協助心理健康研究的目的。研究結果顯示使用SetFit 進行自動分類，多數題項都能達到MacroF1約0.8的標準，只有兩個題項成效在0.65左右。本研究的貢獻之一，在呈現並比較多種自然語言處理派典，在這些困難任務中，其文字理解與分析上的成效。 Users on Twitter often post various tweets describing their moods. Analyzing these tweets can aid in understanding an individual’s psychological state, which will be beneficial to research aimed at promoting public mental health. This study intends to perform automatic classification on tweets related to mental health literacy. Techniques including traditional machine learning as well as AI technologies like BERT, SetFit, GPT-3, and GPT-4 are used to automatically classify them into 11 items across five dimensions, with each item having five related intensity scores. The goal is to achieve a machine prediction effectiveness of over 0.8 with limited human-annotated training data, to ensure the machine can effectively assist in mental health research. The results show that using SetFit, most items can achieve a Macro F1 score of about 0.8, with only two items scoring around 0.65. T he contribution of this study lies in presenting and comparing the effectiveness of various natural language processing paradigms in text comprehension and analysis on these difficult tasks.

Published in Jiàoyù zīliào yǔ túshūguǎn xué

ISSN: 1013-090X (Print); 2309-9100 (Online)
Publisher: Tamkang University Press
Country of publisher: Taiwan, Province of China
LCC subjects: Bibliography. Library science. Information resources
Website: http://joemls.dils.tku.edu.tw/

About the journal

Abstract

Keywords