E3S Web of Conferences (Jan 2023)

Top-K Query for Large Dataset of Restaurant Review Based-on Hadoop MapReduce Framework

  • Nyoman Putri Utami Ni,
  • Wijayanto Heri,
  • Gede Putu Wirarama I.

DOI
https://doi.org/10.1051/e3sconf/202346502033
Journal volume & issue
Vol. 465
p. 02033

Abstract

Read online

In order to develop post-COVID-19 culinary tourism, high-quality facilities and services are essential. Information technology can contribute through a top-k query-based decision-making system. This study implements top-k queries on a distributed Hadoop MapReduce system to evaluate its capability in managing data and selecting culinary tourism potential. Research findings indicate that for the European restaurant data from TripAdvisor with 5 dimensions, both single-node and multi-node (3 nodes) executions exhibit comparable execution times across various data quantities. Conversely, for the European restaurant data from TripAdvisor with 14 dimensions, the use of multi-node (3 nodes) tends to result in longer execution times compared to the single-node approach for larger data quantities. Furthermore, the utilization of multi-node (3 nodes) proves to be more efficient in processing synthetic data with 5 dimensions as the data quantity increases, demonstrating a significant difference in execution times compared to the single-node approach. The study also reveals that across different dimensions, the multi-node (3 nodes) approach generally outperforms the single-node approach in terms of speed. Regarding node variations, processing 20 million data points with 5 dimensions using 6 nodes yields the optimal method with the shortest execution time. By leveraging information technology and a top-k query-based decision-making system, the development of culinary tourism potential can be conducted more efficiently and effectively. The performance of MapReduce in processing culinary tourism potential data can be optimized by employing multi-node execution for large datasets and single-node execution for relatively smaller datasets.