IEEE Access (Jan 2021)

Cancer Registry Coding via Hybrid Neural Symbolic Systems in the Cross-Hospital Setting

  • Hong-Jie Dai,
  • Yi-Hsin Yang,
  • Ti-Hao Wang,
  • Yan-Jie Lin,
  • Pin-Jou Lu,
  • Chung-Yang Wu,
  • Yu-Cheng Chang,
  • You-Qian Lee,
  • You-Chen Zhang,
  • Yuan-Chi Hsu,
  • Han-Hsiang Wu,
  • Cheng-Rong Ke,
  • Chih-Jen Huang,
  • Yu-Tsang Wang,
  • Sheau-Fang Yang,
  • Kuan-Chung Hsiao,
  • Ko-Jiunn Liu,
  • Li-Tzong Chen,
  • I-Shou Chang,
  • K. S. Clifford Chao,
  • Tsang-Wu Liu

DOI
https://doi.org/10.1109/ACCESS.2021.3099175
Journal volume & issue
Vol. 9
pp. 112081 – 112096

Abstract

Read online

Cancer registries are critical databases for cancer research whose maintenance requires various types of domain knowledge with labor-intensive data curation. In order to facilitate the curation process with high quality in a timely manner, we developed a hybrid neural symbolic system for cancer registry coding. Unlike previous works which mainly worked on the dataset collected from one hospital or formulated the task as text classification problems, we collaborated with two medical centers in Taiwan to compile a cross-hospital corpus and applied neural networks to extract cancer registry variables described in unstructured pathology reports along with expert systems for generating registry coding. We conducted experiments to study the feasibility of the proposed hybrid for the task of cancer registry coding and compare its performance with state-of-the-art non-hybrid approaches. Furthermore, cross-hospital experiments were performed to study the advantages and limitations of transfer learning for processing reports from different sources. The experiment results demonstrated that the proposed hybrid neural symbolic system is a robust approach which works well across hospitals and outperformed classification-based baselines by F-scores of 0.13~0.27. Compared to the baseline models, the F-scores of the proposed approaches are apparently higher when fewer training instances were used. All methods benefited from the transferred parameters learned from the source dataset, but the results suggest that it is a better strategy to transfer the learned knowledge through the concept recognition task followed by the symbolic expert system to address the task of cancer registry coding.

Keywords