Journal of Universal Computer Science (Jun 2021)

Forecasting Air Travel Demand for Selected Destinations Using Machine Learning Methods

  • Murat Firat,
  • Derya Yiltas-Kaplan,
  • Ruya Samli

DOI
https://doi.org/10.3897/jucs.68185
Journal volume & issue
Vol. 27, no. 6
pp. 564 – 581

Abstract

Read online Read online Read online

Over the past decades, air transportation has expanded and big data for transportation era has emerged. Accurate travel demand information is an important issue for the transportation systems, especially for airline industry. So, “optimal seat capacity problem between origin and destination pairs” which is related to the load factor must be solved. In this study, a method for determining optimal seat capacity that can supply the highest load factor for the flight operation between any two countries has been introduced. The machine learning methods of Artificial Neural Network (ANN), Linear Regression (LR), Gradient Boosting (GB), and Random Forest (RF) have been applied and a software has been developed to solve the problem. The data set generated from The World Bank Database, which consists of thousands of features for all countries, has been used and a case study has been done for the period of 2014-2019 with Turkish Airlines. To the best of our knowledge, this is the first time that 1983 features have been used to forecast air travel demand in the literature within a model that covers all countries while previous studies cover only a few countries using far fewer features. Another valuable point of this study is the usage of the last regular data about the air transportation before COVID-19 pandemic. In other words, since many airline companies have experienced a decline in the air travel operation in 2020 due to COVID-19 pandemic, this study covers the most recent period (2014-2019) when flight operation performed on a regular basis. As a result, it has been observed that the developed model has forecasted the passenger load factor by an average error rate of 6.741% with GB, 6.763% with RF, 8.161% with ANN, and 9.619 % with LR.

Keywords