Development of message passing-based graph convolutional networks for classifying cancer pathology reports

Hong-Jun Yoon; Hilda B. Klasky; Andrew E. Blanchard; J. Blair Christian; Eric B. Durbin; Xiao-Cheng Wu; Antoinette Stroup; Jennifer Doherty; Linda Coyle; Lynne Penberthy; Georgia D. Tourassi

doi:10.1186/s12911-024-02662-5

BMC Medical Informatics and Decision Making (Sep 2024)

Development of message passing-based graph convolutional networks for classifying cancer pathology reports

Hong-Jun Yoon,
Hilda B. Klasky,
Andrew E. Blanchard,
J. Blair Christian,
Eric B. Durbin,
Xiao-Cheng Wu,
Antoinette Stroup,
Jennifer Doherty,
Linda Coyle,
Lynne Penberthy,
Georgia D. Tourassi

Affiliations

Hong-Jun Yoon: Computational Sciences and Engineering Division, Oak Ridge National Laboratory
Hilda B. Klasky: Computational Sciences and Engineering Division, Oak Ridge National Laboratory
Andrew E. Blanchard: Computational Sciences and Engineering Division, Oak Ridge National Laboratory
J. Blair Christian: Computational Sciences and Engineering Division, Oak Ridge National Laboratory
Eric B. Durbin: College of Medicine, University of Kentucky
Xiao-Cheng Wu: Louisiana Tumor Registry, Louisiana State University Health Sciences Center, School of Public Health
Antoinette Stroup: New Jersey State Cancer Registry, Rutgers Cancer Institute of New Jersey
Jennifer Doherty: Utah Cancer Registry, Huntsman Cancer Institute, University of Utah
Linda Coyle: Information Management Services, Inc.
Lynne Penberthy: Surveillance Research Program, Division of Cancer Control and Population Sciences National Cancer Institute
Georgia D. Tourassi: National Center for Computational Sciences, Oak Ridge National Laboratory

DOI: https://doi.org/10.1186/s12911-024-02662-5
Journal volume & issue: Vol. 24, no. S5
pp. 1 – 11

Abstract

Read online

Abstract Background Applying graph convolutional networks (GCN) to the classification of free-form natural language texts leveraged by graph-of-words features (TextGCN) was studied and confirmed to be an effective means of describing complex natural language texts. However, the text classification models based on the TextGCN possess weaknesses in terms of memory consumption and model dissemination and distribution. In this paper, we present a fast message passing network (FastMPN), implementing a GCN with message passing architecture that provides versatility and flexibility by allowing trainable node embedding and edge weights, helping the GCN model find the better solution. We applied the FastMPN model to the task of clinical information extraction from cancer pathology reports, extracting the following six properties: main site, subsite, laterality, histology, behavior, and grade. Results We evaluated the clinical task performance of the FastMPN models in terms of micro- and macro-averaged F1 scores. A comparison was performed with the multi-task convolutional neural network (MT-CNN) model. Results show that the FastMPN model is equivalent to or better than the MT-CNN. Conclusions Our implementation revealed that our FastMPN model, which is based on the PyTorch platform, can train a large corpus (667,290 training samples) with 202,373 unique words in less than 3 minutes per epoch using one NVIDIA V100 hardware accelerator. Our experiments demonstrated that using this implementation, the clinical task performance scores of information extraction related to tumors from cancer pathology reports were highly competitive.

Published in BMC Medical Informatics and Decision Making

ISSN: 1472-6947 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: http://bmcmedinformdecismak.biomedcentral.com

About the journal

Abstract

Keywords