Applied Sciences (Jan 2025)

An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing

  • Wantong Yang,
  • Enze Wang,
  • Zhiwen Gui,
  • Yuan Zhou,
  • Baosheng Wang,
  • Wei Xie

DOI
https://doi.org/10.3390/app15020962
Journal volume & issue
Vol. 15, no. 2
p. 962

Abstract

Read online

Web application fuzzing faces significant challenges in achieving comprehensive test interface (attack surface) coverage, primarily due to the complexity of user interactions and dynamic website architectures. While web crawlers can automatically access and extract critical website information—including form fields and request parameters—which are essential for generating effective fuzzing test cases, current crawler technologies exhibit three primary limitations: (i) insufficient capabilities in analyzing page relationships and determining page states; (ii) lack of functionality-aware exploration capabilities, resulting in generated inputs with poor contextual relevance; (iii) generation of unstructured operation sequences that fail to execute effectively due to their incompatibility with state-based testing logic. To address these challenges, we propose CrawlMLLM, a framework using multi-modal large language models to simulate human web browsing. It includes three core components: page state mining, functionality analysis, and automatic operation generation. Evaluations show 163% code coverage improvements over SOTA work. When integrated with vulnerability audit tools, CrawlMLLM found 44 vulnerabilities in three vulnerable web applications versus 34 by the baseline. In six real-world applications, CrawlMLLM detected 20 vulnerabilities while the next best method found six.

Keywords