Виртуальная коммуникация и социальные сети (Apr 2025)

Automatic Generation of News Headlines Using RuGPT-3 Neural Network: Effect of Training Dataset on Model Performance

  • Fedor F. Shamigov,
  • Zoya I. Rezanova

DOI
https://doi.org/10.21603/2782-4799-2025-4-1-62-70
Journal volume & issue
Vol. 4, no. 1
pp. 62 – 70

Abstract

Read online

News agencies compete in the digital space, where the success often depends on the promptness of publication, which can be provided by automatic headline generation technologies. This study examined the effect of dataset types on the quality of headline generation, i.e., the impact of dataset type (individual news categories vs. their combination) on the quality of automatic news headlines. The initial hypothesis was that training the RuGPT-3 model on thematic sets of articles and on their totality would give different generated headlines. The authors used the RuGPT-3 model and news articles published by Lenta.ru. The research included three datasets: the categories of science and sports (6,900 articles each) and their combination (6,900 articles). The results confirmed the hypothesis: the model trained on the combined dataset generated higher-quality headlines as measured by the formal ROUGE metric, achieving an average F-score of 0.22 (compared to 0.17 for science and 0.2 for sports). The generated headlines looked authentic and conformed to the good headline practice, i.e., length (≤10 words), predicativity, past tense, active voice, no opening prepositions or figures, no relative time indicators, etc. However, the headlines were not always consistent with the content.

Keywords