Cross-Domain Visual Exploration of Academic Corpora via the Latent Meaning of User-Authored Keywords

Alejandro Benito-Santos; Roberto Theron Sanchez

doi:10.1109/ACCESS.2019.2929754

IEEE Access (Jan 2019)

Cross-Domain Visual Exploration of Academic Corpora via the Latent Meaning of User-Authored Keywords

Alejandro Benito-Santos,
Roberto Theron Sanchez

Affiliations

Alejandro Benito-Santos: ORCiD; Department of Computer Science and Automation, Visual Analytics and Information Visualization Group, University of Salamanca, Salamanca, Spain
Roberto Theron Sanchez: ORCiD; Department of Computer Science and Automation, Visual Analytics and Information Visualization Group, University of Salamanca, Salamanca, Spain

DOI: https://doi.org/10.1109/ACCESS.2019.2929754
Journal volume & issue: Vol. 7
pp. 98144 – 98160

Abstract

Read online

Nowadays, scholars dedicate a substantial amount of their work to the querying and browsing of increasingly large collections of research papers on the Internet. In parallel, the recent surge of novel interdisciplinary approaches in science requires scholars to acquire competencies in new fields for which they may lack the necessary vocabulary to formulate adequate queries. This problem, together with the issue of information overload, poses new challenges in the fields of natural language processing (NLP) and visualization design that call for a rapid response from the scientific community. In this respect, we report on a novel visualization scheme that enables the exploration of research paper collections via the analysis of semantic proximity relationships found in author-assigned keywords. Our proposal replaces traditional string queries with a bag-of-words (BoW) extracted from a user-generated auxiliary corpus that captures the intentionality of the research. Continuing along the lines established by other authors in the fields of literature-based discovery (LBD), NLP, and visual analytics (VA), we combine novel advances in the fields of NLP with visual network analysis techniques to offer scholars a perspective of the target corpus that better fits their research interests. To highlight the advantages of our proposal, we conduct two experiments employing a collection of visualization research papers and an auxiliary cross-domain BoW. Here, we showcase how our visualization can be used to maximize the effectiveness of a browsing session by enhancing the language acquisition task, which allows for effectively extracting knowledge that is in line with the users' previous expectations.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal