Text mining for identification of biological entities related to antibiotic resistant organisms

Kelle Fortunato Costa; Fabrício Almeida Araújo; Jefferson Morais; Carlos Renato Lisboa Frances; Rommel T. J. Ramos

doi:10.7717/peerj.13351

PeerJ (May 2022)

Text mining for identification of biological entities related to antibiotic resistant organisms

Kelle Fortunato Costa,
Fabrício Almeida Araújo,
Jefferson Morais,
Carlos Renato Lisboa Frances,
Rommel T. J. Ramos

Affiliations

Kelle Fortunato Costa: Programa de pós-graduação em Engenharia Elétrica, Universidade Federal do Pará, Belém, Pará, Brazil
Fabrício Almeida Araújo: Biological Science Institute, Universidade Federal do Pará, Belém, Pará, Brazil
Jefferson Morais: Universidade Federal do Pará, Belém, Pará, Brazil
Carlos Renato Lisboa Frances: Programa de pós-graduação em Engenharia Elétrica, Universidade Federal do Pará, Belém, Pará, Brazil
Rommel T. J. Ramos: Biological Science Institute, Universidade Federal do Para, Belém, Pará, Brazil

DOI: https://doi.org/10.7717/peerj.13351
Journal volume & issue: Vol. 10
p. e13351

Abstract

Read online Read online

Antimicrobial resistance is a significant public health problem worldwide. In recent years, the scientific community has been intensifying efforts to combat this problem; many experiments have been developed, and many articles are published in this area. However, the growing volume of biological literature increases the difficulty of the biocuration process due to the cost and time required. Modern text mining tools with the adoption of artificial intelligence technology are helpful to assist in the evolution of research. In this article, we propose a text mining model capable of identifying and ranking prioritizing scientific articles in the context of antimicrobial resistance. We retrieved scientific articles from the PubMed database, adopted machine learning techniques to generate the vector representation of the retrieved scientific articles, and identified their similarity with the context. As a result of this process, we obtained a dataset labeled “Relevant” and “Irrelevant” and used this dataset to implement one supervised learning algorithm to classify new records. The model’s overall performance reached 90% accuracy and the f-measure (harmonic mean between the metrics) reached 82% accuracy for positive class and 93% for negative class, showing quality in the identification of scientific articles relevant to the context. The dataset, scripts and models are available at https://github.com/engbiopct/TextMiningAMR.

Published in PeerJ

ISSN: 2167-8359 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Medicine; Science: Biology (General)
Website: https://peerj.com/

About the journal

Abstract

Keywords