International Journal of Infectious Diseases (Mar 2022)
Linguistic Pattern-infused Dual-channel BiLSTM with Attention to Generate Dengue Case Summaries from ProMED-mail database
Abstract
Purpose: Collaboration for international infectious disease surveillance is critical but difficult due to the different transparency levels of systems sharing health information among countries. ProMED-mail is the most comprehensive expert-curated medical platform and provides rich outbreak information about humans, animals, and plants worldwide. However, its unstructured alerts make analysis difficult. We therefore developed automatic summaries of alert articles from ProMED-mail with a text-summarization method that uses natural language processing technology to automatically extract important sentences to generate summaries that captures key information to facilitate decision-making for epidemic surveillance. Methods & Materials: From ProMED-mail spanning 1994 to 2019, we established a unique dengue corpus using professionals’ annotations which achieved near perfect agreement (90% Cohen's Kappa statistic). To generate ProMED-mail summaries, a dual-channel bidirectional long-short term memory with an attention mechanism that infuses latent-syntactic features was developed to identify key sentences from the alerts. Results: Our method outperformed many well-known machine learning and neural network approaches in identifying important sentences, achieving a macro average F1-score performance of 93%. In addition to verifying the model, we also recruited 5 experts from related fields to conduct a satisfaction survey on the generated summaries, and 83.6% of the summaries received high satisfaction ratings. Conclusion: The proposed approach successfully fuses latent-syntactic features into a deep neural network to analyze the syntactic content information in the text. It then exploits the derived information to identify key sentences. When a new alert arrives, we can quickly identify the case-relevant-information that is essential for reference or further analysis.