Learning representations for image-based profiling of perturbations

Nikita Moshkov; Michael Bornholdt; Santiago Benoit; Matthew Smith; Claire McQuin; Allen Goodman; Rebecca A. Senft; Yu Han; Mehrtash Babadi; Peter Horvath; Beth A. Cimini; Anne E. Carpenter; Shantanu Singh; Juan C. Caicedo

doi:10.1038/s41467-024-45999-1

Nature Communications (Feb 2024)

Learning representations for image-based profiling of perturbations

Nikita Moshkov,
Michael Bornholdt,
Santiago Benoit,
Matthew Smith,
Claire McQuin,
Allen Goodman,
Rebecca A. Senft,
Yu Han,
Mehrtash Babadi,
Peter Horvath,
Beth A. Cimini,
Anne E. Carpenter,
Shantanu Singh,
Juan C. Caicedo

Affiliations

Nikita Moshkov: HUN-REN Biological Research Centre
Michael Bornholdt: Broad Institute of MIT and Harvard
Santiago Benoit: Broad Institute of MIT and Harvard
Matthew Smith: Broad Institute of MIT and Harvard
Claire McQuin: Broad Institute of MIT and Harvard
Allen Goodman: Broad Institute of MIT and Harvard
Rebecca A. Senft: Broad Institute of MIT and Harvard
Yu Han: Broad Institute of MIT and Harvard
Mehrtash Babadi: Broad Institute of MIT and Harvard
Peter Horvath: HUN-REN Biological Research Centre
Beth A. Cimini: Broad Institute of MIT and Harvard
Anne E. Carpenter: Broad Institute of MIT and Harvard
Shantanu Singh: Broad Institute of MIT and Harvard
Juan C. Caicedo: Broad Institute of MIT and Harvard

DOI: https://doi.org/10.1038/s41467-024-45999-1
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Measuring the phenotypic effect of treatments on cells through imaging assays is an efficient and powerful way of studying cell biology, and requires computational methods for transforming images into quantitative data. Here, we present an improved strategy for learning representations of treatment effects from high-throughput imaging, following a causal interpretation. We use weakly supervised learning for modeling associations between images and treatments, and show that it encodes both confounding factors and phenotypic features in the learned representation. To facilitate their separation, we constructed a large training dataset with images from five different studies to maximize experimental diversity, following insights from our causal analysis. Training a model with this dataset successfully improves downstream performance, and produces a reusable convolutional network for image-based profiling, which we call Cell Painting CNN. We evaluated our strategy on three publicly available Cell Painting datasets, and observed that the Cell Painting CNN improves performance in downstream analysis up to 30% with respect to classical features, while also being more computationally efficient.

Published in Nature Communications

ISSN: 2041-1723 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/ncomms/

About the journal