Digital Chinese Medicine (Jun 2019)

Research on Text Mining of Syndrome Element Syndrome Differentiation by Natural Language Processing

  • Deng Wen-Xiang,
  • Zhu Jian-Ping,
  • Li Jing,
  • Yuan Zhi-Ying,
  • Wu Hua-Ying,
  • Yao Zhong- Hua,
  • Zhang Yi-Ge,
  • Zhang Wen-An,
  • Huang Hui-Yong

Journal volume & issue
Vol. 2, no. 2
pp. 61 – 71

Abstract

Read online

Objective: Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods: The first step was to build a text mining and analysis environment based on Python language, and built a corpus based on the core chapters of SESD. The second step was to digitalize the corpus. The main steps included word segmentation, information cleaning and merging, document-entry matrix, dictionary compilation and information conversion. The third step was to mine and display the internal information of SESD corpus by means of word cloud, keyword extraction and visualization. Results: NLP played a positive role in computer recognition and comprehension of SESD. Different chapters had different keywords and weights. Deficiency syndrome elements were an important component of SESD, such as “Qi deficiency” “Yang deficiency” and “Yin deficiency”. The important syndrome elements of substantiality included “Blood stasis” “Qi stagnation”, etc. Core syndrome elements were closely related. Conclusions: Syndrome differentiation and treatment was the core of SESD. Using NLP to excavate syndromes differentiation could help reveal the internal relationship between syndromes differentiation and provide basis for artificial intelligence to learn syndromes differentiation. Keywords: Syndrome element syndrome differentiation (SESD), Natural language processing (NLP), Diagnostics of TCM, Artificial intelligence, Text mining