Open Geosciences (Aug 2023)

Handling imbalanced data in supervised machine learning for lithological mapping using remote sensing and airborne geophysical data

  • Nugroho Hary,
  • Wikantika Ketut,
  • Bijaksana Satria,
  • Saepuloh Asep

DOI
https://doi.org/10.1515/geo-2022-0487
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 16

Abstract

Read online

With balanced training sample (TS) data, learning algorithms offer good results in lithology classification. Meanwhile, unprecedented lithological mapping in remote places is predicted to be difficult, resulting in limited and unbalanced samples. To address this issue, we can use a variety of techniques, including ensemble learning (such as random forest [RF]), over/undersampling, class weight tuning, and hybrid approaches. This work investigates and analyses many strategies for dealing with imbalanced data in lithological classification based on RF algorithms with limited drill log samples using remote sensing and airborne geophysical data. The research was carried out at Komopa, Paniai District, Papua Province, Indonesia. The class weight tuning, oversampling, and balance class weight procedures were used, with TSs ranging from 25 to 500. The oversampling approach outperformed the class weight tuning and balance class weight procedures in general, with the following metric values: 0.70–0.80 (testing accuracy), 0.43–0.56 (F1 score), and 0.32–0.59 (Kappa score). The visual comparison also revealed that the oversampling strategy gave the most reliable classifications: if the imbalance ratio is proportionate to the coverage area in each lithology class, the classifier capability is optimal.

Keywords