npj Computational Materials (Jan 2021)

On strong-scaling and open-source tools for analyzing atom probe tomography data

  • Markus Kühbach,
  • Priyanshu Bajaj,
  • Huan Zhao,
  • Murat H. Çelik,
  • Eric A. Jägle,
  • Baptiste Gault

DOI
https://doi.org/10.1038/s41524-020-00486-1
Journal volume & issue
Vol. 7, no. 1
pp. 1 – 10

Abstract

Read online

Abstract The development of strong-scaling computational tools for high-throughput methods with an open-source code and transparent metadata standards has successfully transformed many computational materials science communities. While such tools are mature already in the condensed-matter physics community, the situation is still very different for many experimentalists. Atom probe tomography (APT) is one example. This microscopy and microanalysis technique has matured into a versatile nano-analytical characterization tool with applications that range from materials science to geology and possibly beyond. Here, data science tools are required for extracting chemo-structural spatial correlations from the reconstructed point cloud. For APT and other high-end analysis techniques, post-processing is mostly executed with proprietary software tools, which are opaque in their execution and have often limited performance. Software development by members of the scientific community has improved the situation but compared to the sophistication in the field of computational materials science several gaps remain. This is particularly the case for open-source tools that support scientific computing hardware, tools which enable high-throughput workflows, and open well-documented metadata standards to align experimental research better with the fair data stewardship principles. To this end, we introduce paraprobe, an open-source tool for scientific computing and high-throughput studying of point cloud data, here exemplified with APT. We show how to quantify uncertainties while applying several computational geometry, spatial statistics, and clustering tasks for post-processing APT datasets as large as two billion ions. These tools work well in concert with Python and HDF5 to enable several orders of magnitude performance gain, automation, and reproducibility.