IEEE Access (Jan 2024)
MP-HTHEDL: A Massively Parallel Hypothesis Evaluation Engine in Description Logic
Abstract
We present MP-HTHEDL, a massively parallel hypothesis evaluation engine for inductive learning in description logic (DL). MP-HTHEDL is an extension on our previous work HT-HEDL, which also targets improving hypothesis evaluation performance for inductive logic programming (ILP) algorithms, that uses DL as their representation language. Unlike our previous work (HT-HEDL), MP-HTHEDL is a massively parallel approach that improves hypothesis evaluation performance through horizontal scaling, by exploiting the computing capabilities of all CPUs and GPUs from networked machines in Hadoop clusters. Many modern CPUs, have extended instruction sets for accelerating specific types of computations – especially for data parallel or vector computations. For CPU-based hypothesis evaluation, MP-HTHEDL employs vectorized multiprocessing as opposed to HT-HEDL’s vectorized multithreading; though, both MP-HTHEDL and HT-HEDL combine the classical scalar processing of multi-core CPUs with the extended vector instructions of each CPU core. This combination of CPUs’ scalar and vector processing, resulted in more extracted performance from CPUs. According to experimental results through Apache Spark implementation, on a Hadoop cluster of 3 worker nodes that have a total of 36 CPU cores and 7 GPUs; the performance improvement achieved using the pure scalar processing power of multi-core CPUs, has yielded a speedup of up to ~25.4 folds. When combining the scalar-processing and the extended vector instructions of those multi-core CPUs, the performance gains increased from ~25.4 folds to ~67 folds, on the same cluster of 3 worker nodes – these large speedups are achieved using only CPU-based processing. In terms of GPU-based evaluation, MP-HTHEDL achieved a speedup of up to ~161 folds, using the GPUs from the same 3 worker nodes.
Keywords