A Semantic and Syntactic Similarity Measure for Political Tweets

Claire Little; David Mclean; Keeley Crockett; Bruce Edmonds

doi:10.1109/ACCESS.2020.3017797

IEEE Access (Jan 2020)

A Semantic and Syntactic Similarity Measure for Political Tweets

Claire Little,
David Mclean,
Keeley Crockett,
Bruce Edmonds

Affiliations

Claire Little: ORCiD; Department of Economics, Policy and International Business, Centre for Policy Modelling, Manchester Metropolitan University, Manchester, U.K.
David Mclean: ORCiD; Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, U.K.
Keeley Crockett: ORCiD; Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, U.K.
Bruce Edmonds: ORCiD; Department of Economics, Policy and International Business, Centre for Policy Modelling, Manchester Metropolitan University, Manchester, U.K.

DOI: https://doi.org/10.1109/ACCESS.2020.3017797
Journal volume & issue: Vol. 8
pp. 154095 – 154113

Abstract

Read online

Measurement of the semantic and syntactic similarity of human utterances is essential in allowing machines to understand dialogue with users. However, human language is complex, and the semantic meaning of an utterance is usually dependent upon the context at a given time and learnt experience of the meaning of the words that are used. This is particularly challenging when automatically understanding the meaning of social media, such as tweets, which can contain non-standard language. Short Text Semantic Similarity measures can be adapted to measure the degree of similarity of a pair of tweets. This work presents a new Semantic and Syntactic Similarity Measure (TSSSM) for political tweets. The approach uses word embeddings to determine semantic similarity and extracts syntactic features to overcome the limitations of current measures which may miss identical sequences of words. A large dataset of tweets focusing on the political domain were collected, pre-processed and used to train the word embedding model, with various experiments performed to determine the optimal model and parameters. A selection of tweet pairs were evaluated by humans for semantic equivalence and correlated against the measure. The new measure can be used in a variety of applications, including for identifying and analyzing political narratives. Experiments on three diverse human-labelled test datasets demonstrate that the measure outperforms an existing measure, performs well on tweets from the political domain and may also generalize outside the political domain.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords