Jambura Journal of Mathematics (Aug 2024)

Propensity Score Matching Pada Pemanfaatan Data Hasil Web Scraping Untuk Perbaikan Statistik Resmi

  • Fatimah Fatimah,
  • Hari Wijayanto,
  • Farit Mochamad Afendi

DOI
https://doi.org/10.37905/jjom.v6i2.26568
Journal volume & issue
Vol. 6, no. 2
pp. 226 – 235

Abstract

Read online

The Central Statistics Agency (BPS) welcomes the challenge of utilizing big data. One of the BPS publications that can be supported using big data is the inflation figure collected from the consumer price survey. One part of the consumer price survey is the HK-4 Survey, which contains house contract rates. So far, the house contract rates produced by BPS have been underestimated or lower than the actual situation. Improvements to house contract rates are carried out by matching BPS data and web scraping of house rental sites using Propensity Score Matching (PSM). The data used in this study includes DKI Jakarta, Bandung, and Semarang from September to October 2023. This study aims to find the best matching model using PSM to improve official statistics (house contract rates) by combining several propensity score value estimation methods and matching algorithms. Furthermore, the results matching the best model will be used to calculate the corrected house contract rates. The study results show that the best matching model generally uses logistic regression propensity score value estimation, the nearest neighbor matching algorithm with returns and uses a 1:1 ratio. The corrected contract rates are far above the official ones (DKI Jakarta corrected 87.27%, Bandung 316.15%, and Semarang 60.04%). Web Scraping allows it to improve official statistics because it is cost and time-saving, enhances the quality of official statistical data, and supports better decision-making in various sectors.

Keywords