IEEE Access (Jan 2024)

A Data-Centric Contrastive Embedding Framework for Contextomized Quote Detection

  • Seonyeong Song,
  • Jiyoung Han,
  • Kunwoo Park

DOI
https://doi.org/10.1109/ACCESS.2024.3377227
Journal volume & issue
Vol. 12
pp. 40168 – 40181

Abstract

Read online

Quotations are essential in lending credibility to news articles. A direct quote, typically enclosed in quotation marks, not only stands out visually but also indicates a reliable source. However, there is a practice known as ‘contextomizing,’ where words are extracted from their original context, changing the speaker’s intended meaning. This results in a headline quote that semantically diverges from any other quote in the main article. This misrepresentation can lead to misunderstandings, especially in online environments where information is often consumed solely through headlines. To address this issue, this paper introduces QuoteCSE++, a data-centric contrastive embedding framework designed for the representation of quote semantics. Utilizing knowledge about the data and the news domain, QuoteCSE++ enhances a BERT-like transformer encoder to represent the complex semantics of news quotes and enables the detection of articles with contextomized headline quotes accurately. Our evaluation experiments demonstrate the superiority of the proposed method over both general-purpose embedding and domain-adapted methods in terms of detection accuracy. Remarkably, the proposed method exhibits a few-shot detection capability, achieving the performance level of SimCSE with just 200 training samples. We also test the ability of this framework for more general tasks of retrieving relevant quotes, implying its potential contribution to relevant fields. We release a dataset of 3,000 examples with high-quality manual annotations to support future research endeavors. Code and dataset are available at https://github.com/ssu-humane/contextomized-quotes-access.

Keywords