Jisuanji kexue (Dec 2022)

Classification of Unreproducible Build Causes Based on Log Information

  • MA Zhao, LIU Dong, REN Zhi-lei, JIANG He

DOI
https://doi.org/10.11896/jsjkx.220300227
Journal volume & issue
Vol. 49, no. 12
pp. 109 – 117

Abstract

Read online

Reproducible build is the ability to recreate binary artifacts in a predefined build environment.Due to the role of reproducible build in ensuring the security of software construction environment and improving the efficiency of software construction and distribution,many open source software repositories(such as Debian) have carried out software reproducible build practice.However,due to the lack of sufficient judgment information and the complexity and diversity of source files,it is still a time-consuming and laborious challenge to determine why software can not be built reproducibly.In order to overcome this challenge,this paper studies the classification and detection of software unreproducible build causes based on machine learning.This paper stu-dies four typical reasons for unreproducible build,namely timestamp,fileordering,randomness and locale.This method uses the word vector generated by word2vec to represent the text log,and then cooperates with the logistic regression model to learn and train the text corpus combined with the difference log and the build log,so as to realize the automatic classification of the causes of unreproducible build.In this paper,the algorithm is implemented and tested on 671 unreproducible build Debian software packa-ges.Experimental results show that our method achieves a macro average precision of 80.75% and a macro average recall of 86.07%,which are better than other commonly used machine learning algorithms.In addition,we also analyze the relevance and importance of difference log and build log.Result indicates that both of them are significant for the classification of unreproducible build causes.This method provides a reliable research basis for automatic classification of unreproducible build causes.

Keywords