IEEE Access (Jan 2021)
An Empirical Study on Using Multi-Labels for Issues in GitHub
Abstract
On GitHub, one of the most successful services for software project hosting, labels have been used to represent various information and decisions about reported issues. However, previous studies on labels were limited to simple statistics or label recommendations for issue types. In this paper, we aim to provide a better understanding of labels and their usage in software development. We particularly focus on using multiple and custom labels on issues. To analyze label usage, we collected software project data and label usage information from GitHub. We then quantitatively investigated the performance of projects with multi-label features and qualitatively investigated the categories of multi-labels, and the usage of multi-labels based on these categories. Our analysis results show that multi-labels are common in the majority of software projects and that projects using multi-label features manage their issues more effectively. In addition, our analysis results reveal different types of information represented by labels, which are related to features, development, and issues. This study finds several facts that can be used for studies on issue management and thus that help develop labeling techniques to mitigate the burden of issue management.
Keywords