Journal of Medical Internet Research (Dec 2024)

From Doubt to Confidence—Overcoming Fraudulent Submissions by Bots and Other Takers of a Web-Based Survey

  • Jeffrey J Hardesty,
  • Elizabeth Crespi,
  • Joshua K Sinamo,
  • Qinghua Nian,
  • Alison Breland,
  • Thomas Eissenberg,
  • Ryan David Kennedy,
  • Joanna E Cohen

DOI
https://doi.org/10.2196/60184
Journal volume & issue
Vol. 26
p. e60184

Abstract

Read online

In 2019, we launched a web-based longitudinal survey of adults who frequently use e-cigarettes, called the Vaping and Patterns of E-cigarette Use Research (VAPER) Study. The initial attempt to collect survey data failed due to fraudulent survey submissions, likely submitted by survey bots and other survey takers. This paper chronicles the journey from that setback to the successful completion of 5 waves of data collection. The section “Naïve Beginnings” examines the study preparation phase, identifying the events, decisions, and assumptions that contributed to the failure (eg, allowing anonymous survey takers to submit surveys and overreliance on a third-party’s proprietary fraud detection tool to identify participants attempting to submit multiple surveys). “A 5-Alarm Fire and Subsequent Investigation” summarizes the warning signs that suggested fraudulent survey submissions had compromised the data integrity after the initial survey launched (eg, an unanticipated acceleration in recruitment and a voicemail alleging fraudulent receipt of multiple gift codes). This section also covers the investigation process, along with conclusions regarding how the methodology was exploited (eg, clearing cookies and using virtual private networks) and the extent of the issue (ie, only 363/1624, 22.4% of the survey completions were likely valid). “Building More Resilient Methodology” details the vulnerabilities and threats that likely compromised the initial survey attempt (eg, anonymity and survey bots); the corresponding mitigation strategies and their benefits and limitations (eg, personal record verification platforms, IP address matching, virtual private network detection services, and CAPTCHA [Completely Automated Public Turing test to tell Computers and Humans Apart]); and the array of strategies that were implemented in future survey attempts. “Staying Vigilant” recounts the identification and management of an additional threat that emerged despite the implementation of an array of mitigation strategies, underscoring the need for ongoing vigilance and adaptability. While the precise nature of the threat remains unknown, the evidence suggested multiple fraudulent surveys were submitted by a single or connected entities, who likely did not possess e-cigarettes. To mitigate the chance of reoccurrence, participants were required to submit an authentic photo of their most used e-cigarette. Finally, in “Reflection 4 Years Later,” we share insights after completing 5 waves of data collection without additional threats or vulnerabilities uncovered that necessitated the application of further mitigation strategies. Reflections include reasons for confidence in the data’s integrity, the scalability and cost-effectiveness of the study protocols, and the potential introduction of sampling bias through recruitment and mitigation strategies. By sharing our journey, we aim to provide valuable insights for researchers facing similar challenges with web-based surveys and those seeking to minimize such challenges a priori. Our experiences highlight the importance of proactive measures, continuous monitoring, and adaptive problem-solving to ensure the integrity of data collected from participants recruited from web-based platforms.