Frontiers in Microbiology (Dec 2022)

Deep learning strategies for addressing issues with small datasets in 2D materials research: Microbial Corrosion

  • Cody Allen,
  • Cody Allen,
  • Cody Allen,
  • Shiva Aryal,
  • Tuyen Do,
  • Rishav Gautum,
  • Md Mahmudul Hasan,
  • Md Mahmudul Hasan,
  • Md Mahmudul Hasan,
  • Bharat K. Jasthi,
  • Bharat K. Jasthi,
  • Bharat K. Jasthi,
  • Etienne Gnimpieba,
  • Etienne Gnimpieba,
  • Venkataramana Gadhamshetty,
  • Venkataramana Gadhamshetty,
  • Venkataramana Gadhamshetty

DOI
https://doi.org/10.3389/fmicb.2022.1059123
Journal volume & issue
Vol. 13

Abstract

Read online

Protective coatings based on two dimensional materials such as graphene have gained traction for diverse applications. Their impermeability, inertness, excellent bonding with metals, and amenability to functionalization renders them as promising coatings for both abiotic and microbiologically influenced corrosion (MIC). Owing to the success of graphene coatings, the whole family of 2D materials, including hexagonal boron nitride and molybdenum disulphide are being screened to obtain other promising coatings. AI-based data-driven models can accelerate virtual screening of 2D coatings with desirable physical and chemical properties. However, lack of large experimental datasets renders training of classifiers difficult and often results in over-fitting. Generate large datasets for MIC resistance of 2D coatings is both complex and laborious. Deep learning data augmentation methods can alleviate this issue by generating synthetic electrochemical data that resembles the training data classes. Here, we investigated two different deep generative models, namely variation autoencoder (VAE) and generative adversarial network (GAN) for generating synthetic data for expanding small experimental datasets. Our model experimental system included few layered graphene over copper surfaces. The synthetic data generated using GAN displayed a greater neural network system performance (83-85% accuracy) than VAE generated synthetic data (78-80% accuracy). However, VAE data performed better (90% accuracy) than GAN data (84%-85% accuracy) when using XGBoost. Finally, we show that synthetic data based on VAE and GAN models can drive machine learning models for developing MIC resistant 2D coatings.

Keywords