Bias in Deep Neural Networks in Land Use Characterization for International Development

Do-Hyung Kim; Guzmán López; Diego Kiedanski; Iyke Maduako; Braulio Ríos; Alan Descoins; Naroa Zurutuza; Shilpa Arora; Christopher Fabian

doi:10.3390/rs13152908

Remote Sensing (Jul 2021)

Bias in Deep Neural Networks in Land Use Characterization for International Development

Do-Hyung Kim,
Guzmán López,
Diego Kiedanski,
Iyke Maduako,
Braulio Ríos,
Alan Descoins,
Naroa Zurutuza,
Shilpa Arora,
Christopher Fabian

Affiliations

Do-Hyung Kim: Office of Innovation, UNICEF, New York, NY 10017, USA
Guzmán López: Tryolabs, Montevideo 11300, Uruguay
Diego Kiedanski: Tryolabs, Montevideo 11300, Uruguay
Iyke Maduako: Office of Innovation, UNICEF, New York, NY 10017, USA
Braulio Ríos: Tryolabs, Montevideo 11300, Uruguay
Alan Descoins: Tryolabs, Montevideo 11300, Uruguay
Naroa Zurutuza: Office of Innovation, UNICEF, New York, NY 10017, USA
Shilpa Arora: Office of Innovation, UNICEF, New York, NY 10017, USA
Christopher Fabian: Office of Innovation, UNICEF, New York, NY 10017, USA

DOI: https://doi.org/10.3390/rs13152908
Journal volume & issue: Vol. 13, no. 15
p. 2908

Abstract

Read online

Understanding the biases in Deep Neural Networks (DNN) based algorithms is gaining paramount importance due to its increased applications on many real-world problems. A known problem of DNN penalizing the underrepresented population could undermine the efficacy of development projects dependent on data produced using DNN-based models. In spite of this, the problems of biases in DNN for Land Use and Land Cover Classification (LULCC) have not been a subject of many studies. In this study, we explore ways to quantify biases in DNN for land use with an example of identifying school buildings in Colombia from satellite imagery. We implement a DNN-based model by fine-tuning an existing, pre-trained model for school building identification. The model achieved overall 84% accuracy. Then, we used socioeconomic covariates to analyze possible biases in the learned representation. The retrained deep neural network was used to extract visual features (embeddings) from satellite image tiles. The embeddings were clustered into four subtypes of schools, and the accuracy of the neural network model was assessed for each cluster. The distributions of various socioeconomic covariates by clusters were analyzed to identify the links between the model accuracy and the aforementioned covariates. Our results indicate that the model accuracy is lowest (57%) where the characteristics of the landscape are predominantly related to poverty and remoteness, which confirms our original assumption on the heterogeneous performances of Artificial Intelligence (AI) algorithms and their biases. Based on our findings, we identify possible sources of bias and present suggestions on how to prepare a balanced training dataset that would result in less biased AI algorithms. The framework used in our study to better understand biases in DNN models would be useful when Machine Learning (ML) techniques are adopted in lieu of ground-based data collection for international development programs. Because such programs aim to solve issues of social inequality, MLs are only applicable when they are transparent and accountable.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords