IEEE Access (Jan 2024)

Opinerium: Subjective Question Generation Using Large Language Models

  • Pedram Babakhani,
  • Andreas Lommatzsch,
  • Torben Brodt,
  • Doreen Sacker,
  • Fikret Sivrikaya,
  • Sahin Albayrak

DOI
https://doi.org/10.1109/ACCESS.2024.3398553
Journal volume & issue
Vol. 12
pp. 66085 – 66099

Abstract

Read online

This paper presents a comprehensive study on generating subjective inquiries for news media posts to empower public engagement with trending media topics. While previous studies primarily focused on factual and objective questions with explicit or implicit answers in the text, this research concentrates on automatically generating subjective questions to directly elicit personal preference from individuals based on a given text. The research methodology involves the application of fine-tuning techniques across multiple iterations of flan-T5 and GPT3 architectures for the task of Seq2Seq generation. This approach is meticulously evaluated using a custom dataset comprising 40,000 news articles along with human-generated questions. Furthermore, a comparative analysis is conducted using zero-shot prompting via GPT-3.5, juxtaposing the performance of fine-tuned models against a significantly larger language model. The study grapples with the inherent challenges tied to evaluating opinion-based question generation due to its subjective nature and the inherent uncertainty in determining answers. A thorough investigation and comparison of two transformer architectures are undertaken utilizing conventional lexical overlap metrics such as BLEU, ROUGE, and METEOR, alongside semantic similarity metrics encompassing BERTScore, BLEURT, and answerability scores such as QAScore, and RQUGE. The findings underscore the marked superiority of the flan-T5 model over GPT3, substantiated not only by quantitative metrics but also through human evaluations. The paper introduces Opinerium based on the open-source flan-T5-Large model, identified as the pacesetter in generating subjective questions. Additionally, we assessed all aforementioned metrics thoroughly by investigating the pairwise Spearman correlation analysis to identify robust metrics.

Keywords