SoftwareX (Jul 2023)
MetricHunter: A software metric dataset generator utilizing SourceMonitor upon public GitHub repositories
Abstract
Version control systems are pervasively consulted nowadays to obtain software metric datasets. Accordingly, machine learning is applied to predict different aspects of a software including quality monitoring, influence analysis, etc. However, construction of a metric dataset is challenging and the dataset content may affect the success of the learning-based models. In this study, we propose a dataset construction tool, MetricHunter, which is able to produce platform/language specific datasets that can be used for predicting the features of newly created software. The proposed tool is developed by C# programming language utilizing a known metric gathering tool, i.e. SourceMonitor, and the GitHub REST API for public repositories. Thus, one can construct a proper dataset from a graphical user interface by simply specifying the programming language or target platform. The outputs of the tool on a set of repositories are validated by investigating automatically generated attribute values and comparing them with the measurements of metric gathering tools as well as the GitHub metric values.