Utterance-Aware Adaptive Data Labeling and Summarization: Exploiting Large Language Models for Unbiased Dialog Annotation

Nikita Glazkov; Ilya Makarov

doi:10.1109/ACCESS.2024.3476981

IEEE Access (Jan 2024)

Utterance-Aware Adaptive Data Labeling and Summarization: Exploiting Large Language Models for Unbiased Dialog Annotation

Nikita Glazkov,
Ilya Makarov

Affiliations

Nikita Glazkov: ORCiD; AI Center, National University of Science and Technology (NUST) MISIS, Moscow, Russia
Ilya Makarov: ORCiD; AI Center, National University of Science and Technology (NUST) MISIS, Moscow, Russia

DOI: https://doi.org/10.1109/ACCESS.2024.3476981
Journal volume & issue: Vol. 12
pp. 150793 – 150806

Abstract

Read online

The field of dialogue summarization has advanced significantly with large language models (LLMs), but their effectiveness can be limited by the size and diversity of training data, as well as concerns about bias. This study proposes a data augmentation method to address the lack of open-source dialogue datasets for summarization while reducing potential biases. Our method uses algorithms that process relationships between key phrases in a dialogue and its summary points, considering two distinct approaches for dialogues smaller or larger than the model’s context. We extract necessary relationships between dialogue and summarization using an LLM adapted to pre-labeled data, which demonstrates results up to 88.26% of accuracy compare to human annotation. We achieved a 4.33x expansion of the original DialogSum, SAMSum, and TweetSumm training sets, leading to a 0.16-point improvement in ROUGE-Lsum (up to 76% growth compared to the baseline). Additionally, we introduce a novel summarization metric tailored to larger than context summarization models during inference, capturing semantic similarity and comprehensiveness of summary points. This metric contributes to the credibility and sustainability of dialogue summarization systems by providing a more robust evaluation framework.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords