Lexis: Journal in English Lexicology (Dec 2020)

Lexical Emergence on Reddit: An Analysis of Lexical Change on the “Front Page of the Internet”

  • Hanna Mahler

DOI
https://doi.org/10.4000/lexis.4917
Journal volume & issue
Vol. 16

Abstract

Read online

The current advancements in the availability and size of electronic corpora, especially containing computer-mediated language, open up new possibilities for the study of change in the English lexicon [Allan & Robinson 2012: 4]. In line with these developments, Grieve et al. [2017] present a methodology for finding “emerging lexemes” and apply it to a corpus of American Twitter data from 2013 to 2014. Their methodology entails searching for word forms that start off with a low overall frequency and that feature a high correlation coefficient with their rank in the time series over the whole year [Grieve et al. 2017: 103-105]. Working with a one-year section of the Pushshift Reddit Dataset (Baumgartner et al. [2020]), this study applies the methodology proposed to a different online forum, Reddit. The present paper therefore has two aims: to test the methodology proposed by Grieve et al. [2017] and to investigate recent lexical emergence on the platform Reddit. This also allows for a comparison between the two platforms Reddit and Twitter to provide further insights into the context-dependence of lexical emergence in the online environment. Furthermore, the trial and refinement of the methodology for discovering emerging lexemes holds valuable insights for scholars looking to use this procedure in the future.Applying the methodology to the Pushshift Reddit Dataset yields a total of eight emerging lexemes; six resulting primarily from onomasiological change, while two appear to be the outcome of semasiological change. The formal characteristics of the emerging lexemes (word class, word formation process) are overall very similar to the features identified by Grieve et al. [2017: 108-109], whereas their trajectories over the time period investigated vary noticeably and do not follow the s-shaped curves that are commonly proposed [e.g. Blythe & Croft 2012] and that are also attested by Grieve et al. [2017: 116]. Concerning the semantic criteria, the semantic domains of the identified lexemes differ considerably from the results by Grieve et al. [2017: 107-108], which can also be explained by the different profiles of the two platforms and their users. Several caveats could be identified for the application of the methodology by Grieve et al. [2017]: first of all, word class ambiguity is likely to distort the frequencies obtained. Secondly, words being attested in a representative corpus was proposed as a more realistic criterion for classifying a word as ‘established’ compared to its inclusion in standard dictionaries. A third problem is that the methodology only allows for the detection of single-word units, which is not an accurate representation of the changes taking place, as several of the emerging lexemes appear to be part of compounds.

Keywords