PLoS ONE (Jan 2018)
Statistical inferences for polarity identification in natural language.
Abstract
Information forms the basis for all human behavior, including the ubiquitous decision-making that people constantly perform in their every day lives. It is thus the mission of researchers to understand how humans process information to reach decisions. In order to facilitate this task, this work proposes LASSO regularization as a statistical tool to extract decisive words from textual content in order to study the reception of granular expressions in natural language. This differs from the usual use of the LASSO as a predictive model and, instead, yields highly interpretable statistical inferences between the occurrences of words and an outcome variable. Accordingly, the method suggests direct implications for the social sciences: it serves as a statistical procedure for generating domain-specific dictionaries as opposed to frequently employed heuristics. In addition, researchers can now identify text segments and word choices that are statistically decisive to authors or readers and, based on this knowledge, test hypotheses from behavioral research.