AI hallucination: towards a comprehensive classification of distorted information in artificial intelligence-generated content

Yujie Sun; Dongfang Sheng; Zihan Zhou; Yifei Wu

doi:10.1057/s41599-024-03811-x

Humanities & Social Sciences Communications (Sep 2024)

AI hallucination: towards a comprehensive classification of distorted information in artificial intelligence-generated content

Yujie Sun,
Dongfang Sheng,
Zihan Zhou,
Yifei Wu

Affiliations

Yujie Sun: Shandong Normal University Library
Dongfang Sheng: School of Management, Shandong University
Zihan Zhou: School of Management, Shandong University
Yifei Wu: School of Management, Shandong University

DOI: https://doi.org/10.1057/s41599-024-03811-x
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Amidst the burgeoning information age, the rapid development of artificial intelligence-generated content (AIGC) has brought forth challenges regarding information authenticity. The proliferation of distorted information significantly impacts users negatively. This study aims to systematically categorize distorted information within AIGC, delve into its internal characteristics, and provide theoretical guidance for its management. Utilizing ChatGPT as a case study, we conducted empirical content analysis on 243 instances of distorted information collected, comprising both questions and answers. Three coders meticulously interpreted each instance of distorted information, encoding error points based on a predefined coding scheme and categorizing them according to error type. Our objective was to refine and validate the distorted information category list derived from the review through multiple rounds of pre-coding and test coding, thereby yielding a comprehensive and clearly delineated category list of distorted information in AIGC. The findings identified 8 first-level error types: “Overfitting”; “Logic errors”; “Reasoning errors”; “Mathematical errors”; “Unfounded fabrication”; “Factual errors”; “Text output errors”; and “Other errors”, further subdivided into 31 second-level error types. This classification list not only lays a solid foundation for studying risks associated with AIGC but also holds significant practical implications for helping users identify distorted information and enabling developers to enhance the quality of AI-generated tools.

Published in Humanities & Social Sciences Communications

ISSN: 2662-9992 (Online)
Publisher: Springer Nature
Country of publisher: United Kingdom
LCC subjects: General Works: History of scholarship and learning. The humanities; Social Sciences
Website: https://www.nature.com/palcomms/

About the journal