Communications Chemistry (Oct 2024)
Machine learning analysis of a large set of homopolymers to predict glass transition temperatures
Abstract
Abstract Glass transition temperature of polymers, Tg, is an important thermophysical property, which sometimes can be difficult to measure experimentally. In this regard, data-driven machine learning approaches are important alternatives to assess Tg values, in a high-throughput way. In this study, a large dataset of more than 900 polymers with reported glass transition temperature (Tg) was assembled from various public sources in order to develop a predictive model depicting the structure-property relationships. The collected dataset was curated, explored via cluster analysis, and then split into training and test sets for validation purposes and then polymer structures characterized by molecular descriptors. To find the models, several machine learning techniques, including multiple linear regression (MLR), k-nearest neighbor (k-NN), support vector machine (SVM), random forest (RF), gaussian processes for regression (GPR), and multi-layer perceptron (MLP) were explored. As result, a model with the subset of 15 descriptors accurately predicting the glass transition temperatures was developed. The electronic effect indices were determined to be important properties that positively contribute to the Tg values. The SVM-based model showed the best performance with determination coefficients (R2) of 0.813 and 0.770, for training and test sets, respectively. Also, the SVM model showed the lowest estimation error, RMSE = 0.062. In addition, the developed structure-property model was implemented as a web app to be used as an online computational tool to design and evaluate new homopolymers with desired glass transition profiles.