Navigating pathways to automated personality prediction: a comparative study of small and medium language models

Fatima Habib; Zeeshan Ali; Akbar Azam; Komal Kamran; Fahad Mansoor Pasha

doi:10.3389/fdata.2024.1387325

Frontiers in Big Data (Sep 2024)

Navigating pathways to automated personality prediction: a comparative study of small and medium language models

Fatima Habib,
Zeeshan Ali,
Akbar Azam,
Komal Kamran,
Fahad Mansoor Pasha

Affiliations

Fatima Habib: FAST School of Management, National University of Computer and Emerging Sciences, Lahore, Pakistan
Zeeshan Ali: Oxford Brookes Business School, Oxford Brookes University, Oxford, United Kingdom
Akbar Azam: FAST School of Management, National University of Computer and Emerging Sciences, Lahore, Pakistan
Komal Kamran: FAST School of Management, National University of Computer and Emerging Sciences, Lahore, Pakistan
Fahad Mansoor Pasha: Faculty of Business Administration, Lahore School of Economics, Lahore, Pakistan

DOI: https://doi.org/10.3389/fdata.2024.1387325
Journal volume & issue: Vol. 7

Abstract

Read online

IntroductionRecent advancements in Natural Language Processing (NLP) and widely available social media data have made it possible to predict human personalities in various computational applications. In this context, pre-trained Large Language Models (LLMs) have gained recognition for their exceptional performance in NLP benchmarks. However, these models require substantial computational resources, escalating their carbon and water footprint. Consequently, a shift toward more computationally efficient smaller models is observed.MethodsThis study compares a small model ALBERT (11.8M parameters) with a larger model, RoBERTa (125M parameters) in predicting big five personality traits. It utilizes the PANDORA dataset comprising Reddit comments, processing them on a Tesla P100-PCIE-16GB GPU. The study customized both models to support multi-output regression and added two linear layers for fine-grained regression analysis.ResultsResults are evaluated on Mean Squared Error (MSE) and Root Mean Squared Error (RMSE), considering the computational resources consumed during training. While ALBERT consumed lower levels of system memory with lower heat emission, it took higher computation time compared to RoBERTa. The study produced comparable levels of MSE, RMSE, and training loss reduction.DiscussionThis highlights the influence of training data quality on the model's performance, outweighing the significance of model size. Theoretical and practical implications are also discussed.

Published in Frontiers in Big Data

ISSN: 2624-909X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.frontiersin.org/journals/big-data

About the journal

Abstract

Keywords