International Journal of Information Management Data Insights (Nov 2022)
Impact of demography on linguistic aspects and readability of reviews and performances of sentiment classifiers
Abstract
This study investigates how user demographics influence the linguistic characteristics and readability of the reviews and the performance of sentiment classification methods. Two restaurant review datasets representing consumers of distinct demographics, such as English language fluency levels, socio-culture, and geographic locations, are compared. It is observed that demographic dissimilarities affect various linguistic attributes, such as sentence structure and vocabulary usage of reviews. The Mann–Whitney U test indicates significant differences in these two sets of reviews on various linguistic aspects. Regarding the readability of the two corpora, no noticeable difference is perceived by Flesch Reading Ease (FRE) test. The discrepancy in the results of lexicon-based and machine learning-based sentiment classification methods in two corpora suggests that demography has an influential role in the performance of sentiment classifiers.