Refining electronic medical records representation in manifold subspace

Bolin Wang; Yuanyuan Sun; Yonghe Chu; Di Zhao; Zhihao Yang; Jian Wang

doi:10.1186/s12859-022-04653-7

BMC Bioinformatics (Apr 2022)

Refining electronic medical records representation in manifold subspace

Bolin Wang,
Yuanyuan Sun,
Yonghe Chu,
Di Zhao,
Zhihao Yang,
Jian Wang

Affiliations

Bolin Wang: College of Computer Science and Technology, Dalian University of Technology
Yuanyuan Sun: College of Computer Science and Technology, Dalian University of Technology
Yonghe Chu: College of Computer Science and Technology, Dalian University of Technology
Di Zhao: College of Computer Science and Technology, Dalian University of Technology
Zhihao Yang: College of Computer Science and Technology, Dalian University of Technology
Jian Wang: College of Computer Science and Technology, Dalian University of Technology

DOI: https://doi.org/10.1186/s12859-022-04653-7
Journal volume & issue: Vol. 23, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Background Electronic medical records (EMR) contain detailed information about patient health. Developing an effective representation model is of great significance for the downstream applications of EMR. However, processing data directly is difficult because EMR data has such characteristics as incompleteness, unstructure and redundancy. Therefore, preprocess of the original data is the key step of EMR data mining. The classic distributed word representations ignore the geometric feature of the word vectors for the representation of EMR data, which often underestimate the similarities between similar words and overestimate the similarities between distant words. This results in word similarity obtained from embedding models being inconsistent with human judgment and much valuable medical information being lost. Results In this study, we propose a biomedical word embedding framework based on manifold subspace. Our proposed model first obtains the word vector representations of the EMR data, and then re-embeds the word vector in the manifold subspace. We develop an efficient optimization algorithm with neighborhood preserving embedding based on manifold optimization. To verify the algorithm presented in this study, we perform experiments on intrinsic evaluation and external classification tasks, and the experimental results demonstrate its advantages over other baseline methods. Conclusions Manifold learning subspace embedding can enhance the representation of distributed word representations in electronic medical record texts. Reduce the difficulty for researchers to process unstructured electronic medical record text data, which has certain biomedical research value.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords