VLSD—An Efficient Subgroup Discovery Algorithm Based on Equivalence Classes and Optimistic Estimate

Antonio Lopez-Martinez-Carrasco; Jose M. Juarez; Manuel Campos; Bernardo Canovas-Segura

doi:10.3390/a16060274

Algorithms (May 2023)

VLSD—An Efficient Subgroup Discovery Algorithm Based on Equivalence Classes and Optimistic Estimate

Antonio Lopez-Martinez-Carrasco,
Jose M. Juarez,
Manuel Campos,
Bernardo Canovas-Segura

Affiliations

Antonio Lopez-Martinez-Carrasco: MedAI-Lab, University of Murcia, 30100 Murcia, Spain
Jose M. Juarez: MedAI-Lab, University of Murcia, 30100 Murcia, Spain
Manuel Campos: MedAI-Lab, University of Murcia, 30100 Murcia, Spain
Bernardo Canovas-Segura: MedAI-Lab, University of Murcia, 30100 Murcia, Spain

DOI: https://doi.org/10.3390/a16060274
Journal volume & issue: Vol. 16, no. 6
p. 274

Abstract

Read online

Subgroup Discovery (SD) is a supervised data mining technique for identifying a set of relations (subgroups) among attributes from a dataset with respect to a target attribute. Two key components of this technique are (i) the metric used to quantify a subgroup extracted, called quality measure, and (ii) the search strategy used, which determines how the search space is explored and how the subgroups are obtained. The proposal made in this work consists of two parts, (1) a new and efficient SD algorithm which is based on the equivalence class exploration strategy, and which uses a pruning based on optimistic estimate, and (2) a data structure used when implementing the algorithm in order to compute subgroup refinements easily and efficiently. One of the most important advantages of this algorithm is its easy parallelization. We have tested the performance of our SD algorithm with respect to some other well-known state-of-the-art SD algorithms in terms of runtime, max memory usage, subgroups selected, and nodes visited. This was completed using a collection of standard, well-known, and popular datasets obtained from the relevant literature. The results confirmed that our algorithm is more efficient than the other algorithms considered.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords