Estudios de Lingüística (Jul 2022)

A corpus-based study of 4-grams in the research article genre

  • Eva Lucía Jiménez-Navarro

DOI
https://doi.org/10.14198/ELUA.22267
Journal volume & issue
no. 38
p. 241

Abstract

Read online

The analysis of phraseology in the specialized discourse of science has sparked researchers’ interest in the last few decades, probably because the use of word groupings in specific registers can provide information about certain typical features of the genre. For instance, Gledhill (2009) explores colligations of tenses in scientific articles and discovers that the present tense is used for qualitative and empirical expressions, while the past tense provides quantitative and research-oriented descriptions; Pérez-Llantada (2014) investigates 4-word lexical bundles in research articles, finding that these multiword combinations express referential meaning and organize the text; finally, Jiménez-Navarro (2019) analyzes adjective + noun collocations in a corpus of scientific papers and concludes that these phraseological units convey specific meanings when used in this genre, since they represent the contents of research articles. The aim of the current study is to contribute to the analysis of 4-grams in the language of science. To this end, two specific objectives are defined: first, to ascertain the structure of 4-grams; second, to analyze the function they perform. The methodology was based on a corpus and entailed five major steps: (1) a specialized corpus of research articles was built, (2) a list of 4-grams was automatically extracted using the software Sketch Engine, (3) the resulting list was manually verified in order to suppress inaccurate candidates, (4) the selected units were classified depending on their structural framework, and (5) the selected units were categorized according to their function in the text. The findings show that, in terms of the first objective, the most typical 4-grams were noun phrases; and as for the second objective, the sequences examined mostly concerned the research conducted and the authorship of the texts. All in all, the 4-grams identified were structures that were specific to the genre under study but could also be used in other domains.