An Evaluative Baseline for Sentence-Level Semantic Division

Kuangsheng Cai; Zugang Chen; Hengliang Guo; Shaohua Wang; Guoqing Li; Jing Li; Feng Chen; Hang Feng

doi:10.3390/make6010003

Machine Learning and Knowledge Extraction (Jan 2024)

An Evaluative Baseline for Sentence-Level Semantic Division

Kuangsheng Cai,
Zugang Chen,
Hengliang Guo,
Shaohua Wang,
Guoqing Li,
Jing Li,
Feng Chen,
Hang Feng

Affiliations

Kuangsheng Cai: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
Zugang Chen: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
Hengliang Guo: School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
Shaohua Wang: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
Guoqing Li: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
Jing Li: Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
Feng Chen: School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
Hang Feng: School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China

DOI: https://doi.org/10.3390/make6010003
Journal volume & issue: Vol. 6, no. 1
pp. 41 – 52

Abstract

Read online

Semantic folding theory (SFT) is an emerging cognitive science theory that aims to explain how the human brain processes and organizes semantic information. The distribution of text into semantic grids is key to SFT. We propose a sentence-level semantic division baseline with 100 grids (SSDB-100), the only dataset we are currently aware of that performs a relevant validation of the sentence-level SFT algorithm, to evaluate the validity of text distribution in semantic grids and divide it using classical division algorithms on SSDB-100. In this article, we describe the construction of SSDB-100. First, a semantic division questionnaire with broad coverage was generated by limiting the uncertainty range of the topics and corpus. Subsequently, through an expert survey, 11 human experts provided feedback. Finally, we analyzed and processed the feedback; the average consistency index for the used feedback was 0.856 after eliminating the invalid feedback. SSDB-100 has 100 semantic grids with clear distinctions between the grids, allowing the dataset to be extended using semantic methods.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords