IEEE Access (Jan 2024)

Cost-Effective Event Mining on the Web via Event Source Page Discovery and Data API Construction

  • Yuan-Hao Lin,
  • Chia-Hui Chang,
  • Hsiu-Min Chuang,
  • Xiang-Shun Lin,
  • Ting Yeh,
  • Min-Jhao Hong

DOI
https://doi.org/10.1109/ACCESS.2024.3445448
Journal volume & issue
Vol. 12
pp. 115981 – 115993

Abstract

Read online

Automatically extracting meetup event information from the Internet can significantly enhance the discovery of activities. Existing methods for meetup event mining rely on the open APIs provided by event-based social networks (EBSN) to capture Meetup event data in designated regions and topics or a comprehensive crawling of the web to filter meetup events. Both approaches have limitations. In this study, we propose a novel four-stage framework to extract meetup events from event organizers’ websites, including event source page discovery, automatic pagination recognition, boilerplate removal, and event detection. From potential event organizer websites obtained from Facebook events, we built 7,012 profile APIs and obtained 520,909 published links from July 13, 2023, to June 24, 2024. Through the boilerplate remover, we extracted 289,541 pieces of valuable information and identified 69,284 event messages by the event detection module. The event page ratio of 13.3% of these event organizers’ websites is much higher than the 1% event page ratio of all websites, revealing the cost-effectiveness of the proposed approach.

Keywords