International Journal of Computational Intelligence Systems (Dec 2015)
Materialized View Selection Based on Adaptive Genetic Algorithm and Its Implementation with Apache Hive
Abstract
Frequently accessed views in data warehouses are usually materialized in order to accelerate the speed of querying big data. However, the view materialization itself incurs huge costs. Moreover, some latest products of non-traditional data warehouse software, such as Apache Hive, still lack the support of ma- terialized views. In order to select the appropriate views to be materialized with the possible minimized cost, we propose a novel approach to the materialized view selection problem based on an adaptive ge- netic algorithm. We establish a cost model that integrates the query, maintenance and storage costs to evaluate the performance of approaches and measure the ï¬tness of an individual in the genetic algorithm. In addition, we introduce the adjustable factors for crossover probability and mutation probability, allow- ing the genetic algorithm to run quickly and avoid premature convergence. We also conduct extensive experiments for its implementation with Apache Hive, which query and manage large datasets residing in distributed storage. Both the simulation results and experiments on Apache Hive show that the approx- imately optimal solution for selecting materialized views can be obtained effectively using the approach presented.
Keywords