SoftwareX (Dec 2024)
LIC: An R package for optimal subset selection for distributed data
Abstract
The goal of the Length and Information Optimization Criterion (LIC) is to handle datasets containing redundant information, identify and select the most informative subsets, and ensure that a large portion of the information from the dataset is retained. The proposed R package, called LIC, is specifically designed for optimal subset selection in distributed redundant data. It achieves this by minimizing the length of the final interval estimator while maximizing the amount of information retained from the selected data subset. This functionality is highly useful across various fields such as economics, industry, and medicine. For example, in studies involving the prediction of nitrogen oxide emissions from gas turbines, self-noise of airfoils under stochastic wind conditions, and real estate valuation predictions, LIC can be used to explore the performance of random distributed block methods in parallel computing environments.