Maximum Marginal Relevance and Vector Space Model for Summarizing Students' Final Project Abstracts

Gunawan Gunawan; Fitria Fitria; Esther Irawati Setiawan; Kimiya Fujisawa

doi:10.17977/um018v6i12023p57-68

Knowledge Engineering and Data Science (May 2023)

Maximum Marginal Relevance and Vector Space Model for Summarizing Students' Final Project Abstracts

Gunawan Gunawan,
Fitria Fitria,
Esther Irawati Setiawan,
Kimiya Fujisawa

Affiliations

Gunawan Gunawan: Institut Sains dan Teknologi Terpadu Surabaya
Fitria Fitria: Institut Sains dan Teknologi Terpadu Surabaya
Esther Irawati Setiawan: Institut Sains dan Teknologi Terpadu Surabaya
Kimiya Fujisawa: Tokyo University of Technology

DOI: https://doi.org/10.17977/um018v6i12023p57-68
Journal volume & issue: Vol. 6, no. 1
pp. 57 – 68

Abstract

Read online

Automatic summarization is reducing a text document with a computer program to create a summary that retains the essential parts of the original document. Automatic summarization is necessary to deal with information overload, and the amount of data is increasing. A summary is needed to get the contents of the article briefly. A summary is an effective way to present extended information in a concise form of the main contents of an article, and the aim is to tell the reader the essence of a central idea. The simple concept of a summary is to take an essential part of the entire contents of the article. Which then presents it back in summary form. The steps in this research will start with the user selecting or searching for text documents that will be summarized with keywords in the abstract as a query. The proposed approach performs text preprocessing for documents: sentence breaking, case folding, word tokenizing, filtering, and stemming. The results of the preprocessed text are weighted by term frequency-inverse document frequency (tf-idf), then weighted for query relevance using the vector space model and sentence similarity using cosine similarity. The next stage is maximum marginal relevance for sentence extraction. The proposed approach provides comprehensive summarization compared with another approach. The test results are compared with manual summaries, which produce an average precision of 88%, recall of 61%, and f-measure of 70%.

Published in Knowledge Engineering and Data Science

ISSN: 2597-4602 (Print); 2597-4637 (Online)
Publisher: Universitas Negeri Malang
Country of publisher: Indonesia
LCC subjects: Bibliography. Library science. Information resources: Information resources (General); Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journal2.um.ac.id/index.php/keds/index

About the journal