T2F: a domain-agnostic multi-agent framework for unstructured text to factuality evaluation items generation

Xin Tong; Jingya Wang; Yasen Aizezi; Hanming Zhai; Bo Jin

doi:10.1007/s44163-025-00294-w

Discover Artificial Intelligence (May 2025)

T2F: a domain-agnostic multi-agent framework for unstructured text to factuality evaluation items generation

Xin Tong,
Jingya Wang,
Yasen Aizezi,
Hanming Zhai,
Bo Jin

Affiliations

Xin Tong: School of Information and Network Security, People’s Public Security University of China
Jingya Wang: School of Information and Network Security, People’s Public Security University of China
Yasen Aizezi: Department of Information Security Engineering, Xinjiang Police college
Hanming Zhai: School of Information and Network Security, People’s Public Security University of China
Bo Jin: National Engineering Research Center of Classified Protection and Safeguard Technology for Cybersecurity, The Third Research Institute of the Ministry of Public Security of China

DOI: https://doi.org/10.1007/s44163-025-00294-w
Journal volume & issue: Vol. 5, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Large language models (LLMs) demonstrate exceptional linguistic capabilities in text generation but remain prone to factual errors, particularly in specialized domains. Traditional factuality evaluation methods primarily rely on human annotation, which is costly, time-consuming, and difficult to generalize across different domains. To address these limitations, this study proposes an innovative multi-agent framework-T2F (Text-to-Factuality)-designed to automatically convert unstructured text into high-quality factuality evaluation datasets. T2F operates through the coordinated efforts of four specialized agents: Analysis, Information Extraction, Generation, and Validation. By systematically processing input data, T2F autonomously generates multiple types of assessment items-including single-choice questions, fill-in-the-blank questions, and true/false statements-without requiring human annotation, while maintaining strong cross-domain applicability. Experimental results demonstrate that T2F achieves data conversion success rates of 99% in the World Heritage domain, 98% in the Medical domain, and 85% in the Film domain. The generated data effectively assess LLMs’ factuality accuracy, highlighting T2F’s capability as a scalable and domain-agnostic factuality evaluation framework.

Published in Discover Artificial Intelligence

ISSN: 2731-0809 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.springer.com/journal/44163

About the journal

Abstract

Keywords