Scientific Data (Jun 2024)

A multimodal framework for extraction and fusion of satellite images and public health data

  • Dana Moukheiber,
  • David Restrepo,
  • Sebastián Andrés Cajas,
  • María Patricia Arbeláez Montoya,
  • Leo Anthony Celi,
  • Kuan-Ting Kuo,
  • Diego M. López,
  • Lama Moukheiber,
  • Mira Moukheiber,
  • Sulaiman Moukheiber,
  • Juan Sebastian Osorio-Valencia,
  • Saptarshi Purkayastha,
  • Atika Rahman Paddo,
  • Chenwei Wu,
  • Po-Chih Kuo

DOI
https://doi.org/10.1038/s41597-024-03366-1
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 20

Abstract

Read online

Abstract In low- and middle-income countries, the substantial costs associated with traditional data collection pose an obstacle to facilitating decision-making in the field of public health. Satellite imagery offers a potential solution, but the image extraction and analysis can be costly and requires specialized expertise. We introduce SatelliteBench, a scalable framework for satellite image extraction and vector embeddings generation. We also propose a novel multimodal fusion pipeline that utilizes a series of satellite imagery and metadata. The framework was evaluated generating a dataset with a collection of 12,636 images and embeddings accompanied by comprehensive metadata, from 81 municipalities in Colombia between 2016 and 2018. The dataset was then evaluated in 3 tasks: including dengue case prediction, poverty assessment, and access to education. The performance showcases the versatility and practicality of SatelliteBench, offering a reproducible, accessible and open tool to enhance decision-making in public health.