Biomolecules (Nov 2024)
KnowVID-19: A Knowledge-Based System to Extract Targeted COVID-19 Information from Online Medical Repositories
Abstract
We present KnowVID-19, a knowledge-based system that assists medical researchers and scientists in extracting targeted information quickly and efficiently from online medical literature repositories, such as PubMed, PubMed Central, and other biomedical sources. The system utilizes various open-source machine learning tools, such as GROBID, S2ORC, and BioC to streamline the processes of data extraction and data mining. Central to the functionality of KnowVID-19 is its keyword-based text classification process, which plays a pivotal role in organizing and categorizing the extracted information. By employing machine learning techniques for keyword extraction—specifically RAKE, YAKE, and KeyBERT—KnowVID-19 systematically categorizes publication data into distinct topics and subtopics. This topic structuring enhances the system’s ability to match user queries with relevant research, improving both the accuracy and efficiency of the search results. In addition, KnowVID-19 leverages the NetworkX Python library to construct networks of the most relevant terms within publications. These networks are then visualized using Cytoscape software, providing a graphical representation of the relationships between key terms. This network visualization allows researchers to easily track emerging trends and developments related to COVID-19, long COVID, and associated topics, facilitating more informed and user-centered exploration of the scientific literature. KnowVID-19 also provides an interactive web application with an intuitive, user-centered interface. This platform supports seamless keyword searching and filtering, as well as a visual network of term associations to help users quickly identify emerging research trends. The responsive design and network visualization enables efficient navigation and access to targeted COVID-19 literature, enhancing both the user experience and the accuracy of data-driven insights.
Keywords