Model enhanced reinforcement learning to enable precision dosing: A theoretical case study with dosing of propofol

Benjamin Ribba; Dominic Stefan Bräm; Paul Gabriel Baverel; Richard Wilson Peck

doi:10.1002/psp4.12858

CPT: Pharmacometrics & Systems Pharmacology (Nov 2022)

Model enhanced reinforcement learning to enable precision dosing: A theoretical case study with dosing of propofol

Benjamin Ribba,
Dominic Stefan Bräm,
Paul Gabriel Baverel,
Richard Wilson Peck

Affiliations

Benjamin Ribba: Roche Pharma Research and Early Development (pRED) F. Hoffmann La Roche Ltd. Basel Switzerland
Dominic Stefan Bräm: Roche Pharma Research and Early Development (pRED) F. Hoffmann La Roche Ltd. Basel Switzerland
Paul Gabriel Baverel: Roche Pharma Research and Early Development (pRED) F. Hoffmann La Roche Ltd. Basel Switzerland
Richard Wilson Peck: Roche Pharma Research and Early Development (pRED) F. Hoffmann La Roche Ltd. Basel Switzerland

DOI: https://doi.org/10.1002/psp4.12858
Journal volume & issue: Vol. 11, no. 11
pp. 1497 – 1510

Abstract

Read online

Abstract Extending the potential of precision dosing requires evaluating methodologies offering more flexibility and higher degree of personalization. Reinforcement learning (RL) holds promise in its ability to integrate multidimensional data in an adaptive process built toward efficient decision making centered on sustainable value creation. For general anesthesia in intensive care units, RL is applied and automatically adjusts dosing through monitoring of patient's consciousness. We further explore the problem of optimal control of anesthesia with propofol by combining RL with state‐of‐the‐art tools used to inform dosing in drug development. In particular, we used pharmacokinetic‐pharmacodynamic (PK‐PD) modeling as a simulation engine to generate experience from dosing scenarios, which cannot be tested experimentally. Through simulations, we show that, when learning from retrospective trial data, more than 100 patients are needed to reach an accuracy within the range of what is achieved with a standard dosing solution. However, embedding a model of drug effect within the RL algorithm improves accuracy by reducing errors to target by 90% through learning to take dosing actions maximizing long‐term benefit. Data residual variability impacts accuracy while the algorithm efficiently coped with up to 50% interindividual variability in the PK and 25% in the PD model's parameters. We illustrate how extending the state definition of the RL agent with meaningful variables is key to achieve high accuracy of optimal dosing policy. These results suggest that RL constitutes an attractive approach for precision dosing when rich data are available or when complemented with synthetic data from model‐based tools used in model‐informed drug development.

Published in CPT: Pharmacometrics & Systems Pharmacology

ISSN: 2163-8306 (Online)
Publisher: Wiley
Country of publisher: United States
LCC subjects: Medicine: Therapeutics. Pharmacology
Website: https://ascpt.onlinelibrary.wiley.com/journal/21638306

About the journal