Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations

Konrad Herbst; Taiyao Wang; Elena J. Forchielli; Meghan Thommes; Ioannis Ch. Paschalidis; Daniel Segrè

doi:10.1038/s42003-024-06093-w

Communications Biology (Apr 2024)

Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations

Konrad Herbst,
Taiyao Wang,
Elena J. Forchielli,
Meghan Thommes,
Ioannis Ch. Paschalidis,
Daniel Segrè

Affiliations

Konrad Herbst: Bioinformatics Program, Boston University
Taiyao Wang: Division of Systems Engineering, Boston University
Elena J. Forchielli: Biological Design Center, Boston University
Meghan Thommes: Biological Design Center, Boston University
Ioannis Ch. Paschalidis: Division of Systems Engineering, Boston University
Daniel Segrè: Bioinformatics Program, Boston University

DOI: https://doi.org/10.1038/s42003-024-06093-w
Journal volume & issue: Vol. 7, no. 1
pp. 1 – 11

Abstract

Read online

Abstract The interpretation of complex biological datasets requires the identification of representative variables that describe the data without critical information loss. This is particularly important in the analysis of large phenotypic datasets (phenomics). Here we introduce Multi-Attribute Subset Selection (MASS), an algorithm which separates a matrix of phenotypes (e.g., yield across microbial species and environmental conditions) into predictor and response sets of conditions. Using mixed integer linear programming, MASS expresses the response conditions as a linear combination of the predictor conditions, while simultaneously searching for the optimally descriptive set of predictors. We apply the algorithm to three microbial datasets and identify environmental conditions that predict phenotypes under other conditions, providing biologically interpretable axes for strain discrimination. MASS could be used to reduce the number of experiments needed to identify species or to map their metabolic capabilities. The generality of the algorithm allows addressing subset selection problems in areas beyond biology.

Published in Communications Biology

ISSN: 2399-3642 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General)
Website: https://www.nature.com/commsbio/

About the journal