The sociolinguistic foundations of language modeling

Jack Grieve; Sara Bartl; Matteo Fuoli; Jason Grafmiller; Weihang Huang; Alejandro Jawerbaum; Akira Murakami; Marcus Perlman; Dana Roemling; Bodo Winter

doi:10.3389/frai.2024.1472411

Frontiers in Artificial Intelligence (Jan 2025)

The sociolinguistic foundations of language modeling

Jack Grieve,
Sara Bartl,
Matteo Fuoli,
Jason Grafmiller,
Weihang Huang,
Alejandro Jawerbaum,
Akira Murakami,
Marcus Perlman,
Dana Roemling,
Bodo Winter

Affiliations

Jack Grieve
Sara Bartl
Matteo Fuoli
Jason Grafmiller
Weihang Huang
Alejandro Jawerbaum
Akira Murakami
Marcus Perlman
Dana Roemling
Bodo Winter

DOI: https://doi.org/10.3389/frai.2024.1472411
Journal volume & issue: Vol. 7

Abstract

Read online

In this article, we introduce a sociolinguistic perspective on language modeling. We claim that language models in general are inherently modeling varieties of language, and we consider how this insight can inform the development and deployment of language models. We begin by presenting a technical definition of the concept of a variety of language as developed in sociolinguistics. We then discuss how this perspective could help us better understand five basic challenges in language modeling: social bias, domain adaptation, alignment, language change, and scale. We argue that to maximize the performance and societal value of language models it is important to carefully compile training corpora that accurately represent the specific varieties of language being modeled, drawing on theories, methods, and descriptions from the field of sociolinguistics.

Published in Frontiers in Artificial Intelligence

ISSN: 2624-8212 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.frontiersin.org/journals/artificial-intelligence#

About the journal

Abstract

Keywords