Opinerium: Subjective Question Generation Using Large Language Models

Pedram Babakhani; Andreas Lommatzsch; Torben Brodt; Doreen Sacker; Fikret Sivrikaya; Sahin Albayrak

doi:10.1109/ACCESS.2024.3398553

IEEE Access (Jan 2024)

Opinerium: Subjective Question Generation Using Large Language Models

Pedram Babakhani,
Andreas Lommatzsch,
Torben Brodt,
Doreen Sacker,
Fikret Sivrikaya,
Sahin Albayrak

Affiliations

Pedram Babakhani: ORCiD; DAI-Labor, Technische Universität Berlin, Berlin, Germany
Andreas Lommatzsch: DAI-Labor, Technische Universität Berlin, Berlin, Germany
Torben Brodt: ORCiD; Opinary GmbH, Berlin, Germany
Doreen Sacker: Opinary GmbH, Berlin, Germany
Fikret Sivrikaya: ORCiD; GT-ARC gGmbH, Berlin, Germany
Sahin Albayrak: ORCiD; DAI-Labor, Technische Universität Berlin, Berlin, Germany

DOI: https://doi.org/10.1109/ACCESS.2024.3398553
Journal volume & issue: Vol. 12
pp. 66085 – 66099

Abstract

Read online

This paper presents a comprehensive study on generating subjective inquiries for news media posts to empower public engagement with trending media topics. While previous studies primarily focused on factual and objective questions with explicit or implicit answers in the text, this research concentrates on automatically generating subjective questions to directly elicit personal preference from individuals based on a given text. The research methodology involves the application of fine-tuning techniques across multiple iterations of flan-T5 and GPT3 architectures for the task of Seq2Seq generation. This approach is meticulously evaluated using a custom dataset comprising 40,000 news articles along with human-generated questions. Furthermore, a comparative analysis is conducted using zero-shot prompting via GPT-3.5, juxtaposing the performance of fine-tuned models against a significantly larger language model. The study grapples with the inherent challenges tied to evaluating opinion-based question generation due to its subjective nature and the inherent uncertainty in determining answers. A thorough investigation and comparison of two transformer architectures are undertaken utilizing conventional lexical overlap metrics such as BLEU, ROUGE, and METEOR, alongside semantic similarity metrics encompassing BERTScore, BLEURT, and answerability scores such as QAScore, and RQUGE. The findings underscore the marked superiority of the flan-T5 model over GPT3, substantiated not only by quantitative metrics but also through human evaluations. The paper introduces Opinerium based on the open-source flan-T5-Large model, identified as the pacesetter in generating subjective questions. Additionally, we assessed all aforementioned metrics thoroughly by investigating the pairwise Spearman correlation analysis to identify robust metrics.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords