Frontiers in High Performance Computing (Mar 2025)

End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment

  • Cong Wang,
  • Valerio Mariani,
  • Frédéric Poitevin,
  • Matthew Avaylon,
  • Jana Thayer

DOI
https://doi.org/10.3389/fhpcp.2025.1536471
Journal volume & issue
Vol. 3

Abstract

Read online

X-ray crystallography reconstruction, which transforms discrete X-ray diffraction patterns into three-dimensional molecular structures, relies critically on accurate Bragg peak finding for structure determination. As X-ray free electron laser (XFEL) facilities advance toward MHz data rates (1 million images per second), traditional peak finding algorithms that require manual parameter tuning or exhaustive grid searches across multiple experiments become increasingly impractical. While deep learning approaches offer promising solutions, their deployment in high-throughput environments presents significant challenges in automated dataset labeling, model scalability, edge deployment efficiency, and distributed inference capabilities. We present an end-to-end deep learning pipeline with three key components: (1) a data engine that combines traditional algorithms with our peak matching algorithm to generate high-quality training data at scale, (2) a modular architecture that scales from a few million to hundreds of million parameters, enabling us to train large expert-level models offline while deploying smaller, distilled models at the edge, and (3) a decoupled producer-consumer architecture that separates specialized data source layer from model inference, enabling flexible deployment across diverse computing environments. Using this integrated approach, our pipeline achieves accuracy comparable to traditional methods tuned by human experts while eliminating the need for experiment-specific parameter tuning. Although current throughput requires optimization for MHz facilities, our system's scalable architecture and demonstrated model compression capabilities provide a foundation for future high-throughput XFEL deployments.

Keywords