Scientific Reports (Jul 2024)

ChemReco: automated recognition of hand-drawn carbon–hydrogen–oxygen structures using deep learning

  • Hengjie Ouyang,
  • Wei Liu,
  • Jiajun Tao,
  • Yanghong Luo,
  • Wanjia Zhang,
  • Jiayu Zhou,
  • Shuqi Geng,
  • Chengpeng Zhang

DOI
https://doi.org/10.1038/s41598-024-67496-7
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Chemical molecular structures are a direct and convenient means of expressing chemical knowledge, playing a vital role in academic communication. In chemistry, hand drawing is a common task for students and researchers. If we can convert hand-drawn chemical molecular structures into machine-readable formats, like SMILES encoding, computers can efficiently process and analyze these structures, significantly enhancing the efficiency of chemical research. Furthermore, with the progress of educational technology, automated grading is gaining popularity. When machines automatically recognize chemical molecular structures and assess the correctness of the drawings, it offers great convenience to teachers. We created ChemReco, a tool designed to identify chemical molecular structures involving three atoms: C, H, and O, providing convenience for chemical researchers. Currently, there are limited studies on hand-drawn chemical molecular structures. Therefore, the primary focus of this paper is constructing datasets. We propose a synthetic image method to rapidly generate images resembling hand-drawn chemical molecular structures, enhancing dataset acquisition efficiency. Regarding model selection, the hand-drawn chemical molecule structural recognition model developed in this article achieves a final recognition accuracy of 96.90%. This model employs the encoder-decoder architecture of EfficientNet + Transformer, demonstrating superior performance compared to other encoder-decoder combinations.

Keywords