Journal of Open Research Software (Jul 2022)
“StudySandboxx: A Tool for Scraping, Sandboxing, Preserving, and Preparing Interactive Web Sites for Use in Human-computer Interaction and Behavioral Studies”
Abstract
Human-computer interaction and computer-mediated behavioral psychology research studies often rely on capturing user interaction data to characterize online behaviors. IRB considerations, site policies, and/or security and privacy concerns may force researchers to use screenshots or offline copies of pages of interest, instead of live websites, in their study designs. These interaction modalities reduce the fidelity and contextual realism of web content and often affect interface aesthetic quality – due to broken links, missing images, and/or malfunctioning scripts. StudySandboxx is a tool that allows websites to be saved exactly as they appear online. The tool sandboxes websites in a way that removes dangerous scripts that threaten privacy and security. Saved websites are encapsulated into a single portable file that contains all related website resources. Finally, the tool also supports certain types of permutations commonly used in research – such as changing links in a page. The project is housed within a GitHub repository at https://github.com/gewethor/study-sandbox.
Keywords