IEEE Access (Jan 2023)

An Improved AdaBoost Algorithm for Highly Imbalanced Datasets in the Co-Authorship Recommendation Problem

  • Vo Duc Quang,
  • Hoang Huu Viet,
  • Vu Hoang Long,
  • Tran Dinh Khang

DOI
https://doi.org/10.1109/ACCESS.2023.3306783
Journal volume & issue
Vol. 11
pp. 89107 – 89123

Abstract

Read online

The co-authorship recommendation problem is attractive since it helps researchers extend collaboration to improve the quality of scientific articles as well as promote innovation. This problem involves suggesting authors join research groups based on their research interests, areas of expertise, and past collaborative experiences to write scientific articles together. In this paper, we tackle the co-authorship recommendation problem by modeling it as a co-authorship network, where each author is represented as a vertex, and each collaboration between two authors is represented as an edge. Since the number of author pairs without collaboration is much larger than those with collaboration, datasets created from co-authorship networks are typically two-class imbalanced datasets. Accordingly, we propose an improved algorithm of AdaBoost combined with the W-SVM algorithm, called Im.AdaBoost.W-SVM, to solve the classification problem with two-class imbalanced datasets. To evaluate the performance of our Im.AdaBoost.W-SVM algorithm for the co-authorship recommendation problem, we collected author and article data from the website https://www.sciencedirect.com through ScienceDirect APIs and created two-class imbalanced datasets. Our experimental results for our self-built co-authorship datasets with different sizes and imbalance ratios showed that our Im.AdaBoost.W-SVM algorithm outperforms the AdaBoost.DecisionTree and AdaBoost.W-SVM algorithms for the co-authorship recommendation problem.

Keywords