Network-Based Document Clustering Using External Ranking Loss for Network Embedding

Yeo Chan Yoon; Hyung Kuen Gee; Heuiseok Lim

doi:10.1109/ACCESS.2019.2948662

IEEE Access (Jan 2019)

Network-Based Document Clustering Using External Ranking Loss for Network Embedding

Yeo Chan Yoon,
Hyung Kuen Gee,
Heuiseok Lim

Affiliations

Yeo Chan Yoon: ORCiD; Telecommunications and Media Research Laboratory, Electronics and Technology Research Institute (ETRI), Daejeon, South Korea
Hyung Kuen Gee: Telecommunications and Media Research Laboratory, Electronics and Technology Research Institute (ETRI), Daejeon, South Korea
Heuiseok Lim: ORCiD; Department of Computer Science and Engineering, Korea University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2019.2948662
Journal volume & issue: Vol. 7
pp. 155412 – 155423

Abstract

Read online

Network-based document clustering involves forming clusters of documents based on their significance and relationship strength. This approach can be used with various types of metadata that express the significance of the documents and the relationships among them. In this study, we defined a probabilistic network graph for fine-grained document clustering and developed a probabilistic generative model and calculation method. Furthermore, a novel neural-network-based network embedding learning method was devised that considers the significance of a document based on its rankings with external measures, such as the download counts of relevant files, and reflects the relationship strength between the documents. By considering the significance of a document, reputative documents of clusters can be centralized and shown as representative documents for tasks such as data analysis and data representation. During evaluation tests, the proposed ranking-based network-embedding method performs significantly better on various algorithms, such as the k-means algorithm and common word/phrase-based clustering methods, than the existing network embedding approaches.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords