Automatic Title Generation in Scientific Articles for Authorship Assistance: A Summarization Approach

Jan Wira Gotama Putra; Masayu Leylia Khodra

doi:10.5614/itbj.ict.res.appl.2017.11.3.3

Journal of ICT Research and Applications (Dec 2017)

Automatic Title Generation in Scientific Articles for Authorship Assistance: A Summarization Approach

Jan Wira Gotama Putra,
Masayu Leylia Khodra

Affiliations

Jan Wira Gotama Putra: Department of Computer Science, School of Electrical Engineering & Informatics, Bandung Institute of Technology, Jalan Ganesa No.10, Bandung 40132
Masayu Leylia Khodra: Department of Computer Science, School of Electrical Engineering & Informatics, Bandung Institute of Technology, Jalan Ganesa No.10, Bandung 40132

DOI: https://doi.org/10.5614/itbj.ict.res.appl.2017.11.3.3
Journal volume & issue: Vol. 11, no. 3

Abstract

Read online

This paper presents a studyon automatic title generation for scientific articles considering sentence information types known as rhetorical categories. A title can be seenas a high-compression summary of a document. A rhetorical category is an information type conveyed by the author of a text for each textual unit, for example: background, method, or result of the research. The experiment in this studyfocused on extracting the research purpose and research method information for inclusion in a computer-generated title. Sentences are classifiedinto rhetorical categories, after which these sentences are filtered using three methods. Three title candidates whose contents reflect the filtered sentencesare then generated using a template-based or an adaptive K-nearest neighbor approach. The experiment was conducted using two different dataset domains: computational linguistics and chemistry. Our study obtained a 0.109-0.255 F1-measure score on average for computer-generated titles compared to original titles. In a human evaluation the automatically generated titles were deemed 'relatively acceptable' in the computational linguistics domain and 'not acceptable' in the chemistry domain. It can be concluded that rhetorical categories have unexplored potential to improve the performance of summarization tasks in general.

Published in Journal of ICT Research and Applications

ISSN: 2337-5787 (Print); 2338-5499 (Online)
Publisher: ITB Journal Publisher
Country of publisher: Indonesia
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://journals.itb.ac.id/index.php/jictra/index

About the journal

Abstract

Keywords