IEEE Access (Jan 2023)
IOSPReD: I/O Specialized Packaging of Reduced Datasets and Data-Intensive Applications for Efficient Reproducibility
Abstract
The data generated by large scale scientific systems such as NASA’s Earth Observing System Data and Information System is expected to increase substantially. Consequently, applications processing these huge volumes of data suffer from lack of storage space at the execution site. This poses a critical challenge while sharing data and reproducing application executions w.r.t. specific user inputs in data-intensive applications. To address this issue, we propose IOSPReD (I/O Specialized Packaging of Reduced Datasets), a data-based debloating framework, designed to automatically track and package only necessary chunks of data (along with the application) in a container. IOSPReD uses the specific inputs provided by the user to identify the necessary data chunks. To do so, the high level user inputs are mapped down to low level data file offsets. We evaluate IOSPReD on different realistic NASA datasets to assess (i) the amount of data reduction, (ii) the reproducibility of results across multiple application executions and also (iii) the impact on performance.
Keywords