PLoS ONE (Jan 2014)
Factors influencing performance of internet-based biosurveillance systems used in epidemic intelligence for early detection of infectious diseases outbreaks.
Abstract
BACKGROUND: Internet-based biosurveillance systems have been developed to detect health threats using information available on the Internet, but system performance has not been assessed relative to end-user needs and perspectives. METHOD AND FINDINGS: Infectious disease events from the French Institute for Public Health Surveillance (InVS) weekly international epidemiological bulletin published in 2010 were used to construct the gold-standard official dataset. Data from six biosurveillance systems were used to detect raw signals (infectious disease events from informal Internet sources): Argus, BioCaster, GPHIN, HealthMap, MedISys and ProMED-mail. Crude detection rates (C-DR), crude sensitivity rates (C-Se) and intrinsic sensitivity rates (I-Se) were calculated from multivariable regressions to evaluate the systems' performance (events detected compared to the gold-standard) 472 raw signals (Internet disease reports) related to the 86 events included in the gold-standard data set were retrieved from the six systems. 84 events were detected before their publication in the gold-standard. The type of sources utilised by the systems varied significantly (p<0001). I-Se varied significantly from 43% to 71% (p=0001) whereas other indicators were similar (C-DR: p=020; C-Se, p=013). I-Se was significantly associated with individual systems, types of system, languages, regions of occurrence, and types of infectious disease. Conversely, no statistical difference of C-DR was observed after adjustment for other variables. CONCLUSION: Although differences could result from a biosurveillance system's conceptual design, findings suggest that the combined expertise amongst systems enhances early detection performance for detection of infectious diseases. While all systems showed similar early detection performance, systems including human moderation were found to have a 53% higher I-Se (p=00001) after adjustment for other variables. Overall, the use of moderation, sources, languages, regions of occurrence, and types of cases were found to influence system performance.