IEEE Access (Jan 2021)

When SMILES Smiles, Practicality Judgment and Yield Prediction of Chemical Reaction via Deep Chemical Language Processing

  • Shu Jiang,
  • Zhuosheng Zhang,
  • Hai Zhao,
  • Jiangtong Li,
  • Yang Yang,
  • Bao-Liang Lu,
  • Ning Xia

DOI
https://doi.org/10.1109/ACCESS.2021.3083838
Journal volume & issue
Vol. 9
pp. 85071 – 85083

Abstract

Read online

Simplified Molecular Input Line Entry System (SMILES) provides a text-based encoding method to describe the structure of chemical species and formulize general chemical reactions. Considering that chemical reactions have been represented in a language form, we present a symbol only model to generally predict the yield of organic synthesis reaction without considering complex quantum physical modeling or chemistry knowledge. Our model is the first deep neural network application that treats chemical reaction text segments as embedding representation to the most recent deep natural language processing. Experimental results show our model can effectively predict chemical reactions, which achieves a high accuracy of 99.76% on practicality judgment and the Root Mean Square Error (RMSE) is around 0.2 for yield prediction. Our work shows the great potential for automatic yield prediction for organic reactions under general conditions and further applications in synthesis path prediction with the least modeling cost.

Keywords