Asian-Pacific Journal of Second and Foreign Language Education (Oct 2023)

Developing and validating a mid-frequency word list for chemistry: a corpus-based approach using big data

  • Ismail Xodabande,
  • Mahmood Reza Atai,
  • Mohammad R. Hashemi,
  • Paul Thompson

DOI
https://doi.org/10.1186/s40862-023-00205-5
Journal volume & issue
Vol. 8, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Given the importance of specialized vocabulary in scientific communication and academic discourse, there is a growing need to create wordlists to address the vocabulary-learning needs of university students and researchers in different subject areas. The current study analyzed a corpus of chemistry research articles (with 278 million running words) to establish a mid-frequency vocabulary list for this field. Using frequency, range, and dispersion criteria, the study identified 560 lemmas in the fourth to the ninth British National Corpus/Corpus of Contemporary American English (BNC/COCA) lists that provided 6.4% coverage of all words in the corpus. The list was validated using specialized and general corpora, and the results confirmed the value and relevance of the items for chemistry. Moreover, for using the list for pedagogical goals, the vocabulary items were divided into five bands based on their coverage and importance. The 100 words in the first band were the most important mid-frequent vocabulary in chemistry, as they provided 3.05% coverage. The study highlights the significant contribution of mid-frequency words in research articles and the findings have implications for using large corpora as a big data source in identifying specialized and field-specific vocabulary.

Keywords