Scientific Data (May 2024)

A tree-based corpus annotated with Cyber-Syndrome, symptoms, and acupoints

  • Wenxi Wang,
  • Zhan Zhao,
  • Huansheng Ning

DOI
https://doi.org/10.1038/s41597-024-03321-0
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Prolonged and over-excessive interaction with cyberspace poses a threat to people’s health and leads to the occurrence of Cyber-Syndrome, which covers not only physiological but also psychological disorders. This paper aims to create a tree-shaped gold-standard corpus that annotates the Cyber-Syndrome, clinical manifestations, and acupoints that can alleviate their symptoms or signs, designating this corpus as CS-A. In the CS-A corpus, this paper defines six entities and relations subject to annotation. There are 448 texts to annotate in total manually. After three rounds of updating the annotation guidelines, the inter-annotator agreement (IAA) improved significantly, resulting in a higher IAA score of 86.05%. The purpose of constructing CS-A corpus is to increase the popularity of Cyber-Syndrome and draw attention to its subtle impact on people’s health. Meanwhile, annotated corpus promotes the development of natural language processing technology. Some model experiments can be implemented based on this corpus, such as optimizing and improving models for discontinuous entity recognition, nested entity recognition, etc. The CS-A corpus has been uploaded to figshare.