Results in Chemistry (Jan 2024)
Machine learning to identify structural motifs in asphaltenes
Abstract
Asphaltenes are organic compounds that aggregate in crude oil with two dominant molecular architectures: archipelago and continental. Continental architectures possess a single uniform island structure composed of aromatic rings in contrast to archipelago architectures with aromatic cores interconnected through aliphatic chains. The structural composition of asphaltenes varies globally due to geographical differences, posing challenges in their classification due to a lack of uniformity. This study is the first known exploration of using image-based supervised machine learning, particularly the ResNet-50 neural network, for the binary classification of asphaltenes into continental and archipelago motifs. 255 continental and archipelago models underwent structural augmentations to create a sample size of 1,530 asphaltene structures that is robust enough for accurate results in both the training and testing portions of the machine learning. These augmentations included the repeated addition of carbons until a complete pentane chain was added to a specified carbon on each asphaltene structure. Using Mathematica, supervised ResNet-50 image-based classification was used on both original and augmented structure datasets to classify as either archipelago or continental. The classification was also implemented using topological similarity searching for association between atoms and the distance between them for further molecule identification. This study demonstrates the surprising effectiveness of image-based classification compared to traditional topological feature-based methods. Our results reveal that deep learning techniques, especially image-based approaches, provide novel and insightful ways to differentiate complex molecular structures like asphaltenes, challenging the traditional reliance on topological features alone. This research opens new avenues in chemical analysis and molecular characterization, highlighting the potential of machine learning in complex molecular systems.