VertiBayes: learning Bayesian network parameters from vertically partitioned data with missing values

Florian van Daalen; Lianne Ippel; Andre Dekker; Inigo Bermejo

doi:10.1007/s40747-024-01424-0

Complex & Intelligent Systems (Apr 2024)

VertiBayes: learning Bayesian network parameters from vertically partitioned data with missing values

Florian van Daalen,
Lianne Ippel,
Andre Dekker,
Inigo Bermejo

Affiliations

Florian van Daalen: Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Reproduction, Maastricht University Medical Centre
Lianne Ippel: Methodology, Statistics Netherlands
Andre Dekker: Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Reproduction, Maastricht University Medical Centre
Inigo Bermejo: Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Reproduction, Maastricht University Medical Centre

DOI: https://doi.org/10.1007/s40747-024-01424-0
Journal volume & issue: Vol. 10, no. 4
pp. 5317 – 5329

Abstract

Read online

Abstract Federated learning makes it possible to train a machine learning model on decentralized data. Bayesian networks are widely used probabilistic graphical models. While some research has been published on the federated learning of Bayesian networks, publications on Bayesian networks in a vertically partitioned data setting are limited, with important omissions, such as handling missing data. We propose a novel method called VertiBayes to train Bayesian networks (structure and parameters) on vertically partitioned data, which can handle missing values as well as an arbitrary number of parties. For structure learning we adapted the K2 algorithm with a privacy-preserving scalar product protocol. For parameter learning, we use a two-step approach: first, we learn an intermediate model using maximum likelihood, treating missing values as a special value, then we train a model on synthetic data generated by the intermediate model using the EM algorithm. The privacy guarantees of VertiBayes are equivalent to those provided by the privacy preserving scalar product protocol used. We experimentally show VertiBayes produces models comparable to those learnt using traditional algorithms. Finally, we propose two alternative approaches to estimate the performance of the model using vertically partitioned data and we show in experiments that these give accurate estimates.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords