Scientific Reports (Apr 2023)

Descriptor engineering in machine learning regression of electronic structure properties for 2D materials

  • Minh Tuan Dau,
  • Mohamed Al Khalfioui,
  • Adrien Michon,
  • Antoine Reserbat-Plantey,
  • Stéphane Vézian,
  • Philippe Boucaud

DOI
https://doi.org/10.1038/s41598-023-31928-7
Journal volume & issue
Vol. 13, no. 1
pp. 1 – 10

Abstract

Read online

Abstract We build new material descriptors to predict the band gap and the work function of 2D materials by tree-based machine-learning models. The descriptor’s construction is based on vectorizing property matrices and on empirical property function, leading to mixing features that require low-resource computations. Combined with database-based features, the mixing features significantly improve the training and prediction of the models. We find R $$^{2}$$ 2 greater than 0.9 and mean absolute errors (MAE) smaller than 0.23 eV both for the training and prediction. The highest R $$^{2}$$ 2 of 0.95, 0.98 and the smallest MAE of 0.16 eV and 0.10 eV were obtained by using extreme gradient boosting for the bandgap and work-function predictions, respectively. These metrics were greatly improved as compared to those of database features-based predictions. We also find that the hybrid features slightly reduce the overfitting despite a small scale of the dataset. The relevance of the descriptor-based method was assessed by predicting and comparing the electronic properties of several 2D materials belonging to new classes (oxides, nitrides, carbides) with those of conventional computations. Our work provides a guideline to efficiently engineer descriptors by using vectorized property matrices and hybrid features for predicting 2D materials properties via ensemble models.