Journal of Universal Computer Science (Jan 2025)

Cross-Community Question Relevance Prediction for Stack Overflow and GitHub

  • Song Yu,
  • Bugao Jiang,
  • Danni Zhang,
  • Zhifang Liao

DOI
https://doi.org/10.3897/jucs.119772
Journal volume & issue
Vol. 31, no. 1
pp. 52 – 71

Abstract

Read online Read online Read online

As the open-source community has evolved, Stack Overflow (SO) has gained extensive usage. The question-and-answer community’s mechanism for recommending related questions helps users discover more content relevant to their current problems, expediting issue resolution. However, the recommendation of relevant questions in a single community context limits the amount of available content and the diversity of content, and the recommendation results rely heavily on the existing knowledge of the community. Stack Overflow still harbors a substantial number of unresolved questions. To address this situation, this paper proposes a cross-community question relevance prediction model, CCQRP, to predict the relevance of Stack Overflow ques-tions and GitHub(GH) issues, and recommend relevant GitHub issues. CCQRP aims to assist developers in effectively resolving problems and enhancing development efficiency. We design an embedding layer incorporating BERTOverflow and Bi-LSTM and devise a weighted attention matrix based on named entity types of tokens. This matrix assigns different weights to tokens of varying named entity types during the prediction process, capturing critical information to predict the relevance of SO questions and GH issues. Due to the lack of existing datasets, we construct a dataset named Question-Issue dataset (QI), consisting of Stack Overflow questions, GitHub issues, and the corresponding question-issue relevance, containing 240,000 related SO question-GH issue pairs and 470,000 unrelated pairs. We evaluate the effectiveness of CCQRP on QI. Compared to the latest models (MQDD, CodeBERT, ASIM), CCQRP demonstrates an improvement in F1-score ranging from 0.60% to 10.86% and exhibits robust generalization capabilities.

Keywords