IEEE Access (Jan 2024)
MP-SPILDL: A Massively Parallel Inductive Logic Learner in Description Logic
Abstract
This article presents MP-SPILDL, a massively parallel inductive logic learner in Description Logic (DL). MP-SPILDL is a scalable inductive Logic Programming (ILP) algorithm that exploits existing Big Data infrastructure to perform large-scale inductive logic learning in DL (the $\mathcal {ALCQI}^{\mathcal {(D)}}$ DL language in particular). MP-SPILDL targets accelerating both hypothesis search and hypothesis evaluation by aggregating the computing power of multi-core CPUs with their vector/SIMD instructions and multi-GPUs in a Hadoop cluster. In terms of hypothesis search, MP-SPILDL employs a novel MapReduce-based algorithm that performs distributed parallel hypothesis search. MP-SPILDL also employs a novel MapReduce-based procedure that eliminates all redundant hypotheses generated after each learning iteration. Moreover, MP-SPILDL utilizes deterministic ordering of hypotheses’ operands to avoid exploring redundant areas of the search space, similar to the DL-Learner, the state of the art in DL-based ILP literature. In terms of hypothesis evaluation, MP-SPILDL performs parallel hypothesis evaluation, which uses all CPU cores combined with their vector instructions and all multi-GPUs of all machines in the Hadoop cluster. According to the experimental results using an Apache Spark implementation on a Hadoop cluster of three worker machines (36 total CPU cores, 7 total GPUs), MP-SPILDL achieved speedups of up to 13.3 folds using parallel beam search with $beamWidth = 32$ and CPU-based vectorized hypothesis evaluation – the best-case scenario. On small datasets such as Michalski’s trains, MP-SPILDL achieved a slower performance than the baseline, representing the worst-case scenario.
Keywords