Machine Learning with Applications (Mar 2022)
Airbnb rental price modeling based on Latent Dirichlet Allocation and MESF-XGBoost composite model
Abstract
Airbnb price modeling is an important decision-making tool that determines the acceptability and profitability of the service. In this study, we demonstrated how proper descriptions of an Airbnb listing and location could influence determining the prices. We assumed the proper description of a listing property positively influences the renter’s decision making; therefore, we applied a Latent Dirichlet Allocation (LDA) based topic model for generating synthetic variables from the textual description of property aiming to improve price prediction accuracy. Additionally, we applied a Moran Eigenvector Spatial Filtering based XGBoost (MESF-XGBoost) model to address the spatial dependence of location data and improve prediction accuracy. Our study at the San Jose County Airbnb dataset found that the number of bedrooms, accommodations, property types, and the total number of reviews positively influence the listing price, whereas the absence of a super host badge and cancellation policy negatively influence the price. The experiment demonstrates that incorporating synthetic variables from both LDA and MESF into the model specification improves the prediction accuracy. The experiment reveals that the XGBoost model with only non-spatial features is not strong enough to address spatial dependence; therefore, it cannot minimize spatial autocorrelation issues.