Scientific Data (Feb 2023)

A large dataset of semantic ratings and its computational extension

  • Shaonan Wang,
  • Yunhao Zhang,
  • Weiting Shi,
  • Guangyao Zhang,
  • Jiajun Zhang,
  • Nan Lin,
  • Chengqing Zong

DOI
https://doi.org/10.1038/s41597-023-01995-6
Journal volume & issue
Vol. 10, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Evidence from psychology and cognitive neuroscience indicates that the human brain’s semantic system contains several specific subsystems, each representing a particular dimension of semantic information. Word ratings on these different semantic dimensions can help investigate the behavioral and neural impacts of semantic dimensions on language processes and build computational representations of language meaning according to the semantic space of the human cognitive system. Existing semantic rating databases provide ratings for hundreds to thousands of words, which can hardly support a comprehensive semantic analysis of natural texts or speech. This article reports a large database, the Six Semantic Dimension Database (SSDD), which contains subjective ratings for 17,940 commonly used Chinese words on six major semantic dimensions: vision, motor, socialness, emotion, time, and space. Furthermore, using computational models to learn the mapping relations between subjective ratings and word embeddings, we include the estimated semantic ratings for 1,427,992 Chinese and 1,515,633 English words in the SSDD. The SSDD will aid studies on natural language processing, text analysis, and semantic representation in the brain.