Scientific Data (Jun 2024)

Harmonizing existing climate change mitigation policy datasets with a hybrid machine learning approach

  • Libo Wu,
  • Zhihao Huang,
  • Xing Zhang,
  • Yushi Wang

DOI
https://doi.org/10.1038/s41597-024-03411-z
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 14

Abstract

Read online

Abstract With the rapid proliferation of climate policies in both number and scope, there is an increasing demand for a global-level dataset that provides multi-indicator information on policy elements and their implementation contexts. To address this need, we developed the Global Climate Change Mitigation Policy Dataset (GCCMPD) using a semisupervised hybrid machine learning approach, drawing upon policy information from global, regional, and sector-specific sources. Differing from existing climate policy datasets, the GCCMPD covers a large range of policies, amounting to 73,625 policies of 216 entities. Through the integration of expert knowledge-based dictionary mapping, probability statistics methods, and advanced natural language processing technology, the GCCMPD offers detailed classification of multiple indicators and consistent information on sectoral policy instruments. This includes insights into objectives, target sectors, instruments, legal compulsion, administrative entities, etc. By aligning with the sector classification of the Intergovernmental Panel on Climate Change (IPCC) emission datasets, the GCCMPD serves to help policy-makers, researchers, and social organizations gain a deeper understanding of the similarities and distinctions among climate activities across countries, sectors, and entities.