Journal of King Saud University: Computer and Information Sciences (Oct 2023)
WFP-Collector: Automated dataset collection framework for website fingerprinting evaluations on Tor Browser
Abstract
Website Fingerprinting (WFP) is a technique that analyses browsing network traffic to infer a webpage that a user browsed in Tor Browser. A sufficiently large and clean dataset is essential for quality and accurate WFP experiments. Thus, there is a corresponding need to automate the dataset collection, filtering, and validation processes. This work introduces a new paradigm, WFP-Collector, an automatic dataset collection framework for WFP experiments. WFP-Collector enables researchers to automatically (1) create a visit database for webpage crawling, (2) collect various data and log for in-depth analysis, (3) webpage visits in tablet and mobile browsing modes, (4) throttle network bandwidth and latency performance, (5) validate and filter the collected data, (6) compress and upload the collected data to cloud storage, and (7) completion notification using Telegram and email. We developed a proof-of-concept of the WFP framework for simulation and comparison. We found that the WFP-Collector framework collects nine data items and produces over 55% larger collected data size than existing approaches. The captured packet size in tablet and mobile browsing modes is up to 57.5% smaller than in desktop mode. Moreover, the file compression in WFP-Collector can reduce up to 39.9% of the storage space required for data collection.