Technologies (Aug 2024)
Explainable Graph Neural Networks: An Application to Open Statistics Knowledge Graphs for Estimating House Prices
Abstract
In the rapidly evolving field of real estate economics, the prediction of house prices continues to be a complex challenge, intricately tied to a multitude of socio-economic factors. Traditional predictive models often overlook spatial interdependencies that significantly influence housing prices. The objective of this study is to leverage Graph Neural Networks (GNNs) on open statistics knowledge graphs to model these spatial dependencies and predict house prices across Scotland’s 2011 data zones. The methodology involves retrieving integrated statistical indicators from the official Scottish Open Government Data portal and applying three representative GNN algorithms: ChebNet, GCN, and GraphSAGE. These GNNs are compared against traditional models, including the tabular-based XGBoost and a simple Multi-Layer Perceptron (MLP), demonstrating superior prediction accuracy. Innovative contributions of this study include the use of GNNs to model spatial dependencies in real estate economics and the application of local and global explainability techniques to enhance transparency and trust in the predictions. The global feature importance is determined by a logistic regression surrogate model while the local, region-level understanding of the GNN predictions is achieved through the use of GNNExplainer. Explainability results are compared with those from a previous work that applied the XGBoost machine learning algorithm and the SHapley Additive exPlanations (SHAP) explainability framework on the same dataset. Interestingly, both the global surrogate model and the SHAP approach underscored the comparative illness factor, a health indicator, and the ratio of detached dwellings as the most crucial features in the global explainability. In the case of local explanations, while both methods showed similar results, the GNN approach provided a richer, more comprehensive understanding of the predictions for two specific data zones.
Keywords