npj Computational Materials (Jun 2023)

Validating neural networks for spectroscopic classification on a universal synthetic dataset

  • Jan Schuetzke,
  • Nathan J. Szymanski,
  • Markus Reischl

DOI
https://doi.org/10.1038/s41524-023-01055-y
Journal volume & issue
Vol. 9, no. 1
pp. 1 – 12

Abstract

Read online

Abstract To aid the development of machine learning models for automated spectroscopic data classification, we created a universal synthetic dataset for the validation of their performance. The dataset mimics the characteristic appearance of experimental measurements from techniques such as X-ray diffraction, nuclear magnetic resonance, and Raman spectroscopy among others. We applied eight neural network architectures to classify artificial spectra, evaluating their ability to handle common experimental artifacts. While all models achieved over 98% accuracy on the synthetic dataset, misclassifications occurred when spectra had overlapping peaks or intensities. We found that non-linear activation functions, specifically ReLU in the fully-connected layers, were crucial for distinguishing between these classes, while adding more sophisticated components, such as residual blocks or normalization layers, provided no performance benefit. Based on these findings, we summarize key design principles for neural networks in spectroscopic data classification and publicly share all scripts used in this study.