Jisuanji kexue (Sep 2022)

Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network

  • LYU Xiao-feng, ZHAO Shu-liang, GAO Heng-da, WU Yong-liang, ZHANG Bao-qi

DOI
https://doi.org/10.11896/jsjkx.210700241
Journal volume & issue
Vol. 49, no. 9
pp. 92 – 100

Abstract

Read online

With the deep integration of computer technology into social life,more and more short text messages are spreaded all over the web platform.Aiming at the problem of data sparsity of short texts,a robust heterogeneous information network framework(HTE) for modeling short texts,which can integrate any type of additional information and capture the relationship between them to solve the data sparsity problem,is constructed.Based on this framework,six short text expansion methods are designed using different external knowledge,and the short text features are enriched by introducing entity information such as entities,entity categories,inter-entity relationships and textual information such as text topics from Wikipedia and Freebase knowledge bases.Finally,the similarity measurement result is used to verify the experimental effect.By comparing the six text expansion me-thods with the traditional three similarity measures on two short text datasets and the current mainstream short text matching algorithms,the results of the proposed six text expansion methods are improved.Compared with BERT,the similarity measurement results of the best method improves by 5.97%.The proposed framework is robust and can include any type of external know-ledge,and the proposed method can overcome the data sparsity problem of short texts and can perform similarity metrics on short texts with high accuracy in an unsupervised manner.

Keywords