PLoS ONE (Jan 2021)
Predicting the rental value of houses in household surveys in Tanzania, Uganda and Malawi: Evaluations of hedonic pricing and machine learning approaches.
Abstract
Housing value is a major component of the aggregate expenditure used in the analyses of welfare status of households in the development economics literature. Therefore, an accurate estimation of housing services is important to obtain the value of housing in household surveys. Data show that a significant proportion of households in a typical Living Standard Measurement Survey (LSMS), adopted by the Word Bank and others, are self-owned. The standard approach to predict the housing value for such surveys is based on the rental cost of the house. A hedonic pricing applying an Ordinary Least Squares (OLS) method is normally used to predict rental values. The literature shows that Machine Learning (ML) methods, shown to uncover generalizable patterns based on a given data, have better predictive power over OLS applied in other valuation exercises. We examined whether or not a class of ML methods (e.g. Ridge, LASSO, Tree, Bagging, Random Forest, and Boosting) provided superior prediction of rental value of housing over OLS methods accounting for spatial autocorrelations using household level survey data from Uganda, Tanzania, and Malawi, across multiple years. Our results showed that the Machine Learning methods (Boosting, Bagging, Forest, Ridge and LASSO) are the best models in predicting house values using out-of-sample data set for all the countries and all the years. On the other hand, Tree regression underperformed relative to the various OLS models, over the same data sets. With the availability of abundant data and better computing power, ML methods provide viable alternative to predicting housing values in household surveys.