Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study

Mayara Khadhraoui; Hatem Bellaaj; Mehdi Ben Ammar; Habib Hamam; Mohamed Jmaiel

doi:10.3390/app12062891

Applied Sciences (Mar 2022)

Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study

Mayara Khadhraoui,
Hatem Bellaaj,
Mehdi Ben Ammar,
Habib Hamam,
Mohamed Jmaiel

Affiliations

Mayara Khadhraoui: National Engineering School of Sfax (ENIS), University of Sfax, Sfax 3038, Tunisia
Hatem Bellaaj: ReDCAD Laboratory, Department of Computer Engineering and Applied Mathematics, University of Sfax, Sfax 3029, Tunisia
Mehdi Ben Ammar: Solutions Galore Inc., Moncton, NB E1C 5Y1, Canada
Habib Hamam: Faculty of Engineering, Université de Moncton, Moncton, NB E1A 3E9, Canada
Mohamed Jmaiel: ReDCAD Laboratory, Department of Computer Engineering and Applied Mathematics, University of Sfax, Sfax 3029, Tunisia

DOI: https://doi.org/10.3390/app12062891
Journal volume & issue: Vol. 12, no. 6
p. 2891

Abstract

Read online

On 30 January 2020, the World Health Organization announced a new coronavirus, which later turned out to be very dangerous. Since that date, COVID-19 has spread to become a pandemic that has now affected practically all regions in the world. Since then, many researchers in medicine have contributed to fighting COVID-19. In this context and given the great growth of scientific publications related to this global pandemic, manual text and data retrieval has become a challenging task. To remedy this challenge, we are proposing CovBERT, a pre-trained language model based on the BERT model to automate the literature review process. CovBERT relies on prior training on a large corpus of scientific publications in the biomedical domain and related to COVID-19 to increase its performance on the literature review task. We evaluate CovBERT on the classification of short text based on our scientific dataset of biomedical articles on COVID-19 entitled COV-Dat-20. We demonstrate statistically significant improvements by using BERT.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords