STAR Protocols (Sep 2023)

Protocol for the automatic extraction of epidemiological information via a pre-trained language model

  • Zhizheng Wang,
  • Xiao Fan Liu,
  • Zhanwei Du,
  • Lin Wang,
  • Ye Wu,
  • Petter Holme,
  • Michael Lachmann,
  • Hongfei Lin,
  • Zhuoyue Wang,
  • Yu Cao,
  • Zoie S.Y. Wong,
  • Xiao-Ke Xu,
  • Yuanyuan Sun

Journal volume & issue
Vol. 4, no. 3
p. 102392

Abstract

Read online

Summary: The lack of systems to automatically extract epidemiological fields from open-access COVID-19 cases restricts the timeliness of formulating prevention measures. Here we present a protocol for using CCIE, a COVID-19 Cases Information Extraction system based on the pre-trained language model.1 We describe steps for preparing supervised training data and executing python scripts for named entity recognition and text category classification. We then detail the use of machine evaluation and manual validation to illustrate the effectiveness of CCIE.For complete details on the use and execution of this protocol, please refer to Wang et al.2 : Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.

Keywords