A natural language processing approach to detect inconsistencies in death investigation notes attributing suicide circumstances

Song Wang; Yiliang Zhou; Ziqiang Han; Cui Tao; Yunyu Xiao; Ying Ding; Joydeep Ghosh; Yifan Peng

doi:10.1038/s43856-024-00631-7

Communications Medicine (Oct 2024)

A natural language processing approach to detect inconsistencies in death investigation notes attributing suicide circumstances

Song Wang,
Yiliang Zhou,
Ziqiang Han,
Cui Tao,
Yunyu Xiao,
Ying Ding,
Joydeep Ghosh,
Yifan Peng

Affiliations

Song Wang: Cockrell School of Engineering, The University of Texas at Austin
Yiliang Zhou: Population Health Sciences, Weill Cornell Medicine
Ziqiang Han: School of Political Science and Public Administration, Shandong University
Cui Tao: Department of AI and Informatics, Mayo Clinic
Yunyu Xiao: Population Health Sciences, Weill Cornell Medicine
Ying Ding: School of Information, The University of Texas at Austin
Joydeep Ghosh: Cockrell School of Engineering, The University of Texas at Austin
Yifan Peng: Population Health Sciences, Weill Cornell Medicine

DOI: https://doi.org/10.1038/s43856-024-00631-7
Journal volume & issue: Vol. 4, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns and causing factors of death. Recent studies suggested the annotation inconsistencies within the NVDRS and the potential impact on erroneous suicide-circumstance attributions. Methods We present an empirical Natural Language Processing (NLP) approach to detect annotation inconsistencies and adopt a cross-validation-like paradigm to identify possible label errors. We analyzed 267,804 suicide death incidents between 2003 and 2020 from the NVDRS. We measured annotation inconsistency by the degree of changes in the F-1 score. Results Our results show that incorporating the target state’s data into training the suicide-circumstance classifier brings an increase of 5.4% to the F-1 score on the target state’s test set and a decrease of 1.1% on other states’ test set. Conclusions To conclude, we present an NLP framework to detect the annotation inconsistencies, show the effectiveness of identifying and rectifying possible label errors, and eventually propose an improvement solution to improve the coding consistency of human annotators.

Published in Communications Medicine

ISSN: 2730-664X (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine
Website: https://www.nature.com/commsmed/

About the journal