大数据 (Sep 2024)
Building domain lexicon oriented to behavioral features in depression
Abstract
Behavioral representations of the patients with depression reflect the clinical features and condition of the patients, therefore it is beneficial for disease diagnosis. However, in the construction of current depression lexicon, the correlation between the behavioral features and the condition of patients in depression texts is overlooked, resulting in incompleteness of the lexicon information. To address this problem, a domain lexicon construction, oriented to behavioral features in depression. was proposed which aimed to extend the domain lexicon's coverage of emotional expressions. Firstly, the seed word sets of sentiment and behavior were constructed by the TF-IDF algorithm respectively, the word set of sentiment was obtained by calculating PMI similarity between the seed word set of sentiment and the existing sentiment lexicon Secondly, the seed words of behavioral were labeled based on correspondence between behavioral features and the condition of patients, and further inputted into WoBERT with depression texts to separately generate dynamic word vectors. In addition, the candidate word set was acquired by calculating the similarity between the seed word set of behavioral and depression texts In addition,based on the similarity between words, the semantic graph was constructed to obtain the word set of behavioral features by label propagation algorithm. Finally, the emoticons with negative emotions on Weibo were collected to build the word set of emoticons. The word set of sentiment, the word set of behavioral features and the word set of emoticons were integrated into the Chinese Depression Domain Lexicon. Experimental results show that the constructed lexicon can improve the effect of depression text classification.