Scientific Data (May 2024)

ChineseEEG: A Chinese Linguistic Corpora EEG Dataset for Semantic Alignment and Neural Decoding

  • Xinyu Mou,
  • Cuilin He,
  • Liwei Tan,
  • Junjie Yu,
  • Huadong Liang,
  • Jianyu Zhang,
  • Yan Tian,
  • Yu-Fang Yang,
  • Ting Xu,
  • Qing Wang,
  • Miao Cao,
  • Zijiao Chen,
  • Chuan-Peng Hu,
  • Xindi Wang,
  • Quanying Liu,
  • Haiyan Wu

DOI
https://doi.org/10.1038/s41597-024-03398-7
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 13

Abstract

Read online

Abstract An Electroencephalography (EEG) dataset utilizing rich text stimuli can advance the understanding of how the brain encodes semantic information and contribute to semantic decoding in brain-computer interface (BCI). Addressing the scarcity of EEG datasets featuring Chinese linguistic stimuli, we present the ChineseEEG dataset, a high-density EEG dataset complemented by simultaneous eye-tracking recordings. This dataset was compiled while 10 participants silently read approximately 13 hours of Chinese text from two well-known novels. This dataset provides long-duration EEG recordings, along with pre-processed EEG sensor-level data and semantic embeddings of reading materials extracted by a pre-trained natural language processing (NLP) model. As a pilot EEG dataset derived from natural Chinese linguistic stimuli, ChineseEEG can significantly support research across neuroscience, NLP, and linguistics. It establishes a benchmark dataset for Chinese semantic decoding, aids in the development of BCIs, and facilitates the exploration of alignment between large language models and human cognitive processes. It can also aid research into the brain’s mechanisms of language processing within the context of the Chinese natural language.