Knowledge Engineering and Data Science (May 2023)
Maximum Marginal Relevance and Vector Space Model for Summarizing Students' Final Project Abstracts
Abstract
Automatic summarization is reducing a text document with a computer program to create a summary that retains the essential parts of the original document. Automatic summarization is necessary to deal with information overload, and the amount of data is increasing. A summary is needed to get the contents of the article briefly. A summary is an effective way to present extended information in a concise form of the main contents of an article, and the aim is to tell the reader the essence of a central idea. The simple concept of a summary is to take an essential part of the entire contents of the article. Which then presents it back in summary form. The steps in this research will start with the user selecting or searching for text documents that will be summarized with keywords in the abstract as a query. The proposed approach performs text preprocessing for documents: sentence breaking, case folding, word tokenizing, filtering, and stemming. The results of the preprocessed text are weighted by term frequency-inverse document frequency (tf-idf), then weighted for query relevance using the vector space model and sentence similarity using cosine similarity. The next stage is maximum marginal relevance for sentence extraction. The proposed approach provides comprehensive summarization compared with another approach. The test results are compared with manual summaries, which produce an average precision of 88%, recall of 61%, and f-measure of 70%.