Automated Recognition of Chemical Molecule Images Based on an Improved TNT Model

Yanchi Li; Guanyu Chen; Xiang Li

doi:10.3390/app12020680

Applied Sciences (Jan 2022)

Automated Recognition of Chemical Molecule Images Based on an Improved TNT Model

Yanchi Li,
Guanyu Chen,
Xiang Li

Affiliations

Yanchi Li: School of Computer Science, China University of Geosciences, Wuhan 430079, China
Guanyu Chen: Informatization Office, China University of Geosciences, Wuhan 430079, China
Xiang Li: School of Computer Science, China University of Geosciences, Wuhan 430079, China

DOI: https://doi.org/10.3390/app12020680
Journal volume & issue: Vol. 12, no. 2
p. 680

Abstract

Read online

The automated recognition of optical chemical structures, with the help of machine learning, could speed up research and development efforts. However, historical sources often have some level of image corruption, which reduces the performance to near zero. To solve this downside, we need a dependable algorithmic program to help chemists to further expand their research. This paper reports the results of research conducted for the Bristol-Myers Squibb-Molecular Translation competition, which was held on Kaggle and which invited participants to convert old chemical images to their underlying chemical structures, annotated as InChI text; we define this work as molecular translation. We proposed a model based on a transformer, which can be utilized in molecular translation. To better capture the details of the chemical structure, the image features we want to extract need to be accurate at the pixel level. TNT is one of the existing transformer models that can meet this requirement. This model was originally used for image classification, and is essentially a transformer-encoder, which cannot be utilized for generation tasks. On the other hand, we believe that TNT cannot integrate the local information of images well, so we improve the core module of TNT—TNT block—and propose a novel module—Deep TNT block—by stacking the module to form an encoder structure, and then use the vanilla transformer-decoder as a decoder, forming a chemical formula generation model based on the encoder–decoder structure. Since molecular translation is an image-captioning task, we named it the Image Captioning Model based on Deep TNT (ICMDT). A comparison with different models shows that our model has benefits in each convergence speed and final description accuracy. We have designed a complete process in the model inference and fusion phase to further enhance the final results.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords