Machine learning optimization of candidate antibody yields highly diverse sub-nanomolar affinity antibody libraries

Lin Li; Esther Gupta; John Spaeth; Leslie Shing; Rafael Jaimes; Emily Engelhart; Randolph Lopez; Rajmonda S. Caceres; Tristan Bepler; Matthew E. Walsh

doi:10.1038/s41467-023-39022-2

Nature Communications (Jun 2023)

Machine learning optimization of candidate antibody yields highly diverse sub-nanomolar affinity antibody libraries

Lin Li,
Esther Gupta,
John Spaeth,
Leslie Shing,
Rafael Jaimes,
Emily Engelhart,
Randolph Lopez,
Rajmonda S. Caceres,
Tristan Bepler,
Matthew E. Walsh

Affiliations

Lin Li: Massachusetts Institute of Technology Lincoln Laboratory
Esther Gupta: Massachusetts Institute of Technology Lincoln Laboratory
John Spaeth: Massachusetts Institute of Technology Lincoln Laboratory
Leslie Shing: Massachusetts Institute of Technology Lincoln Laboratory
Rafael Jaimes: Massachusetts Institute of Technology Lincoln Laboratory
Emily Engelhart: A-Alpha Bio, Inc.
Randolph Lopez: A-Alpha Bio, Inc.
Rajmonda S. Caceres: Massachusetts Institute of Technology Lincoln Laboratory
Tristan Bepler: Research Laboratory of Electronics, Massachusetts Institute of Technology
Matthew E. Walsh: Massachusetts Institute of Technology Lincoln Laboratory

DOI: https://doi.org/10.1038/s41467-023-39022-2
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Therapeutic antibodies are an important and rapidly growing drug modality. However, the design and discovery of early-stage antibody therapeutics remain a time and cost-intensive endeavor. Here we present an end-to-end Bayesian, language model-based method for designing large and diverse libraries of high-affinity single-chain variable fragments (scFvs) that are then empirically measured. In a head-to-head comparison with a directed evolution approach, we show that the best scFv generated from our method represents a 28.7-fold improvement in binding over the best scFv from the directed evolution. Additionally, 99% of designed scFvs in our most successful library are improvements over the initial candidate scFv. By comparing a library’s predicted success to actual measurements, we demonstrate our method’s ability to explore tradeoffs between library success and diversity. Results of our work highlight the significant impact machine learning models can have on scFv development. We expect our method to be broadly applicable and provide value to other protein engineering tasks.

Published in Nature Communications

ISSN: 2041-1723 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/ncomms/

About the journal