Urban Science (Apr 2020)
Towards an Integrated Methodology for Model and Variable Selection Using Count Data: An Application to Micro-Retail Distribution in Urban Studies
Abstract
Over the last two decades, a growing number of works in urban studies have revealed how micro-retail distribution is significantly related to specific properties of the urban built environment. While a wide variety of urban form measures have been investigated using sophisticated analytical approaches, the same attention has not equally been found in statistical procedures. Several essential features of micro-retail statistical distribution and modelling assumptions are frequently overlooked, compromising the statistical robustness of outcomes. In this work we focus on four main aspects: (i) the discrete, non-negative and highly skewed nature of store distribution; (ii) its zero-inflation; (iii) assessment of the contextual effect; and (iv) the multicollinearity generated by the inclusion of highly related urban descriptors. To overcome these limitations, we propose an integrated methodological framework for both modelling and variable selection assessment based on generalized linear models (GLMs) and elastic-net (Enet) penalized regression (PR), respectively. The procedure is tested via a real case study of the French Riviera, which is described using a large dataset of 105 street-based urban form measures. The outcomes of this procedure show the superiority of the zero-inflate negative binomial count regression approach. A restricted number of urban form properties are found to be related to the micro-retail distribution depending on the specific scale and morphological context under analysis.
Keywords