Social Sciences and Humanities Open (Jan 2021)
Comparison of machine learning algorithms for content based personality resolution of tweets
Abstract
The content of social media (SM) is expanding quickly with individuals sharing their feelings in a variety of ways, all of which depict their personalities to varying degrees. This study endeavored to build a system that could predict an individual's personality through SM conversation. Four BIG5 personality items (i.e. Extraversion (EXT), Consciousness (CON), Agreeable (AGR) and Openness to Experiences (OPN) equivalent to the Myers–Briggs Type Indicator (MBTI)) were predicted using six supervised machine learning (SML) algorithms. In order to handle unstructured and unbalanced SM conversations, three feature extraction methods (i.e. term frequency and inverse document frequency (TF-IDF), the bag of words (BOW) and the global vector for word representation (GloVe)) were used. The TF-IDF method of feature extraction produces 2–9% higher accuracy than word2vec representation. GloVe is advocated as a better feature extractor because it maintains the spatial information of words.