Mathematics (Aug 2024)

Identification of Patterns in CO<sub>2</sub> Emissions among 208 Countries: K-Means Clustering Combined with PCA and Non-Linear <i>t</i>-SNE Visualization

  • Ana Lorena Jiménez-Preciado,
  • Salvador Cruz-Aké,
  • Francisco Venegas-Martínez

DOI
https://doi.org/10.3390/math12162591
Journal volume & issue
Vol. 12, no. 16
p. 2591

Abstract

Read online

This paper identifies patterns in total and per capita CO2 emissions among 208 countries considering different emission sources, such as cement, flaring, gas, oil, and coal. This research uses linear and non-linear dimensional reduction techniques, combining K-means clustering with principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE), which allows the identification of distinct emission profiles among nations. This approach allows effective clustering of heterogeneous countries despite the highly dimensional nature of emissions data. The optimal number of clusters is determined using Calinski–Harabasz and Davies–Bouldin scores, of five and six clusters for total and per capita CO2 emissions, respectively. The findings reveal that for total emissions, t-SNE brings together the world’s largest economies and emitters, i.e., China, USA, India, and Russia, into a single cluster, while PCA provides clusters with a single country for China, USA, and Russia. Regarding per capita emissions, PCA generates a cluster with only one country, Qatar, due to its significant flaring emissions, as byproduct of the oil industry, and its low population. This study concludes that international collaboration and coherent global policies are crucial for effectively addressing CO2 emissions and developing targeted climate change mitigation strategies.

Keywords