LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification

Dian Kurniasari; Warsono; Mustofa Usman; Favorisen Rosyking Lumbanraja; Wamiliana

doi:10.26554/sti.2024.9.2.273-283

Science and Technology Indonesia (Apr 2024)

LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification

Dian Kurniasari,
Warsono,
Mustofa Usman,
Favorisen Rosyking Lumbanraja,
Wamiliana

Affiliations

Dian Kurniasari: Doctoral Student at the Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, Indonesia
Warsono: Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, Indonesia
Mustofa Usman: Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, Indonesia
Favorisen Rosyking Lumbanraja: Department of Computer Science, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, Indonesia
Wamiliana: Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, Indonesia

DOI: https://doi.org/10.26554/sti.2024.9.2.273-283
Journal volume & issue: Vol. 9, no. 2
pp. 273 – 283

Abstract

Read online

The rise in mortality rates due to leukemia has fueled the swift expansion of publications concerning the disease. The increase in publications has dramatically affected the enhancement of biomedical literature, further complicating the manual extraction of pertinent material on leukemia. Text classification is an approach used to retrieve pertinent and top-notch information from the biomedical literature. This research suggests employing an LSTM-CNN hybrid model to tackle imbalanced data classification in a dataset of PubMed abstracts centred on leukemia. Random Undersampling and Random Oversampling techniques are merged to tackle the data imbalance problem. The classification model’s performance is improved by utilizing a pre trained word embedding created explicitly for the biomedical domain, BioWordVec. Model evaluation indicates that hybrid resampling techniques with domain-specific pre-trained word embeddings can enhance model performance in classification tasks, achieving accuracy, precision, recall, and f1-score of 99.55%, 99%, 100%, and 99%, respectively. The results suggest that this research could be an alternative technique to help obtain information about leukemia.

Published in Science and Technology Indonesia

ISSN: 2580-4405 (Print); 2580-4391 (Online)
Publisher: Magister Program of Material Sciences, Graduate School of Universitas Sriwijaya
Country of publisher: Indonesia
LCC subjects: Technology; Science
Website: https://sciencetechindonesia.com/index.php/jsti

About the journal

Abstract

Keywords