ISPRS International Journal of Geo-Information (Apr 2016)

Reconstructing Sessions from Data Discovery and Access Logs to Build a Semantic Knowledge Base for Improving Data Discovery

  • Yongyao Jiang,
  • Yun Li,
  • Chaowei Yang,
  • Edward M. Armstrong,
  • Thomas Huang,
  • David Moroni

DOI
https://doi.org/10.3390/ijgi5050054
Journal volume & issue
Vol. 5, no. 5
p. 54

Abstract

Read online

Big geospatial data are archived and made available through online web discovery and access. However, finding the right data for scientific research and application development is still a challenge. This paper aims to improve the data discovery by mining the user knowledge from log files. Specifically, user web session reconstruction is focused upon in this paper as a critical step for extracting usage patterns. However, reconstructing user sessions from raw web logs has always been difficult, as a session identifier tends to be missing in most data portals. To address this problem, we propose two session identification methods, including time-clustering-based and time-referrer-based methods. We also present the workflow of session reconstruction and discuss the approach of selecting appropriate thresholds for relevant steps in the workflow. The proposed session identification methods and workflow are proven to be able to extract data access patterns for further pattern analyses of user behavior and improvement of data discovery for more relevancy data ranking, suggestion, and navigation.

Keywords