An open automation system for predatory journal detection

Li-Xian Chen; Shih-Wen Su; Chia-Hung Liao; Kai-Sin Wong; Shyan-Ming Yuan

doi:10.1038/s41598-023-30176-z

Scientific Reports (Feb 2023)

An open automation system for predatory journal detection

Li-Xian Chen,
Shih-Wen Su,
Chia-Hung Liao,
Kai-Sin Wong,
Shyan-Ming Yuan

Affiliations

Li-Xian Chen: School of Big Data, Fuzhou University of International Studies and Trade
Shih-Wen Su: Department of Computer Science, National Yang Ming Chiao Tung University
Chia-Hung Liao: Department of Computer Science, National Yang Ming Chiao Tung University
Kai-Sin Wong: Department of Computer Science, National Yang Ming Chiao Tung University
Shyan-Ming Yuan: Department of Computer Science, National Yang Ming Chiao Tung University

DOI: https://doi.org/10.1038/s41598-023-30176-z
Journal volume & issue: Vol. 13, no. 1
pp. 1 – 17

Abstract

Read online

Abstract The growing number of online open-access journals promotes academic exchanges, but the prevalence of predatory journals is undermining the scholarly reporting process. Data collection, feature extraction, and model prediction are common steps in tools designed to distinguish between legitimate and predatory academic journals and publisher websites. The authors include them in their proposed academic journal predatory checking (AJPC) system based on machine learning methods. The AJPC data collection process extracts 833 blacklists and 1213 whitelists information from websites to be used for identifying words and phrases that might indicate the presence of predatory journals. Feature extraction is used to identify words and terms that help detect predatory websites, and the system’s prediction stage uses eight classification algorithms to distinguish between potentially predatory and legitimate journals. We found that enhancing the classification efficiency of the bag of words model and TF-IDF algorithm with diff scores (a measure of differences in specific word frequencies between journals) can assist in identifying predatory journal feature words. Results from performance tests suggest that our system works as well as or better than those currently being used to identify suspect publishers and publications. The open system only provides reference results rather than absolute opinions and accepts user inquiries and feedback to update the system and optimize performance.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal