Catalysts (Aug 2023)

Engineering of Substrate Tunnel of P450 CYP116B3 though Machine Learning

  • Yiheng Liu,
  • Zhongyu Li,
  • Chenqi Cao,
  • Xianzhi Zhang,
  • Shuaiqi Meng,
  • Mehdi D. Davari,
  • Haijun Xu,
  • Yu Ji,
  • Ulrich Schwaneberg,
  • Luo Liu

DOI
https://doi.org/10.3390/catal13081228
Journal volume & issue
Vol. 13, no. 8
p. 1228

Abstract

Read online

The combinatorial complexity of the protein sequence space presents a significant challenge for recombination experiments targeting beneficial positions. To overcome these difficulties, a machine learning (ML) approach was employed, which was trained on a limited literature dataset and combined with iterative generation and experimental data implementation. The PyPEF method was utilized to identify existing variants and predict recombinant variants targeting the substrate channel of P450 CYP116B3. Through molecular dynamics simulations, eight multiple-substituted improved variants were successfully validated. Specifically, the RMSF of variant A86T/T91H/M108S/A109M/T111P was decreased from 3.06 Å (wild type) to 1.07 Å. Additionally, the average RMSF of the variant A86T/T91P/M108V/A109M/T111P decreased to 1.41 Å, compared to the wild type’s 1.53 Å. Of particular significance was the prediction that the variant A86T/T91H/M108G/A109M/T111P exhibited an activity approximately 15 times higher than that of the wild type. Furthermore, during the selection of the regression model, PLS and MLP regressions were compared. The effect of data size and data relevance on the two regression approaches has been summarized. The aforementioned conclusions provide evidence for the feasibility of the strategy that combines ML with experimental approaches. This integrated strategy proves effective in exploring potential variations within the protein sequence space. Furthermore, this method facilitates a deeper understanding of the substrate channel in P450 CYP116B3.

Keywords