Communications (Jul 2022)

Smote vs. Random Undersampling for Imbalanced Data - Car Ownership Demand Model

  • Wuttikrai Chaipanha,
  • Patiphan Kaewwichian

DOI
https://doi.org/10.26552/com.C.2022.3.D105-D115
Journal volume & issue
Vol. 24, no. 3
pp. D105 – D115

Abstract

Read online

Because the numbers of cars reflect each person's travel behaviors for each specific location, the car ownership demand model plays a dominant role in analysis of the travel demand in order to understand each area's individual and household travel behaviors. However, the study project for the master plan of the Khon Kaen expressway represented imbalanced data; namely, the majority class and the minority class were not equal. Before developing a machine learning model, this study suggested a solution to balance the data by using oversampling and under-sampling techniques. The data, which had been improved with SMOTE (Synthetic Minority Oversampling Technique) and kNN (k-nearest neighbors) (k = 5), demonstrated a better effect than the other algorithms that were studied. The TPR (true positive rate) for the rural and suburban areas, which are types of regions with very different imbalance ratios, was calculated before balancing the data at 46.9 % and 46.4 %. As a result, the TPR values were 63.5 % and 54.4 %, respectively, following the data balancing.

Keywords