IEEE Access (Jan 2023)

ML-Augmented Automation for Recovering Links Between Pull-Requests and Issues on GitHub

  • Zakarea Alshara,
  • Hamzeh Eyal Salman,
  • Anas Shatnawi,
  • Abdelhak-Djamel Seriai

DOI
https://doi.org/10.1109/ACCESS.2023.3236392
Journal volume & issue
Vol. 11
pp. 5596 – 5608

Abstract

Read online

GitHub provides a distributed and collaborative platform to develop and maintain open-source projects. This social coding platform achieves this collaborative development, with or without coordination, using pull requests and issues artefacts. When the number of daily submitted issues rapidly grows up, especially in popular repositories, managing issues becomes more complicated. To help the repository’s developers in issues processing, there are external contributors who fix issues by submitting pull-requests. On GitHub, a pull-request is frequently linked with a submitted issue to show that a solution is in progress. Unfortunately, contributors might be lazy or forget to link the Pull-Requests with their corresponding Issues. Only a very small share of these links are established, whereas a large portion of links is missed in the development history. In spite of that, even for senior developers, manually recovering the links between Pull-Request and Issues from evolutionary development history is a time-consuming, challenging, and error-prone task. In this article, we propose to build ML models to recover links between pull-requests and their issues using two Machine Learning algorithms (KMeans and BIRCH) based on lexical and semantic weighting measurements. These models are evaluated using PI-Link ground-truth dataset. The obtained results show that pull-request and issue links can be recovered with an accuracy of 91.5% using BIRCH clustering algorithm.

Keywords