Machine Learning and Knowledge Extraction (Jan 2024)

An Evaluative Baseline for Sentence-Level Semantic Division

  • Kuangsheng Cai,
  • Zugang Chen,
  • Hengliang Guo,
  • Shaohua Wang,
  • Guoqing Li,
  • Jing Li,
  • Feng Chen,
  • Hang Feng

DOI
https://doi.org/10.3390/make6010003
Journal volume & issue
Vol. 6, no. 1
pp. 41 – 52

Abstract

Read online

Semantic folding theory (SFT) is an emerging cognitive science theory that aims to explain how the human brain processes and organizes semantic information. The distribution of text into semantic grids is key to SFT. We propose a sentence-level semantic division baseline with 100 grids (SSDB-100), the only dataset we are currently aware of that performs a relevant validation of the sentence-level SFT algorithm, to evaluate the validity of text distribution in semantic grids and divide it using classical division algorithms on SSDB-100. In this article, we describe the construction of SSDB-100. First, a semantic division questionnaire with broad coverage was generated by limiting the uncertainty range of the topics and corpus. Subsequently, through an expert survey, 11 human experts provided feedback. Finally, we analyzed and processed the feedback; the average consistency index for the used feedback was 0.856 after eliminating the invalid feedback. SSDB-100 has 100 semantic grids with clear distinctions between the grids, allowing the dataset to be extended using semantic methods.

Keywords