Journal of King Saud University: Computer and Information Sciences (Apr 2022)
Frequent itemset-based feature selection and Rider Moth Search Algorithm for document clustering
Abstract
Document clustering has recently been paid great attention in retrieval, navigation, and summarization of huge volumes of documents. With a better document clustering approach, computers can organize a document corpus automatically to a meaningful cluster for enabling efficient navigation, and browsing of the corpus. Document navigation and browsing is a valuable complement to the deficiencies of information retrieval technologies. This paper introduces Modsup-based frequent itemset and Rider Optimization-based Moth Search Algorithm (Rn-MSA) for clustering the documents. At first, the input documents are given to the pre-processing step, and then, the extraction is carried out based on TF-IDF and Wordnet features. Once the extraction is done, the feature selection is carried out based on frequent itemset for the establishment of feature knowledge. At last, the document clustering is done using the proposed Rn-MSA, which is designed by combining Rider Optimization Algorithm (ROA), and the Moth Search Algorithm (MSA). The performance of the document clustering based on proposed Modsup + Rn-MSA is evaluated in terms of precision, recall, F-Measure, and accuracy. The developed document clustering method achieves the maximal precision of 95.90%, maximal recall of 96.41%, maximal F-Measure of 96.41%, and the maximal accuracy of 95.12% that indicates its superiority.