Biological Imaging (Jan 2023)

Scipion3: A workflow engine for cryo-electron microscopy image processing and structural biology

  • Pablo Conesa,
  • Yunior C. Fonseca,
  • Jorge Jiménez de la Morena,
  • Grigory Sharov,
  • Jose Miguel de la Rosa-Trevín,
  • Ana Cuervo,
  • Alberto García Mena,
  • Borja Rodríguez de Francisco,
  • Daniel del Hoyo,
  • David Herreros,
  • Daniel Marchan,
  • David Strelak,
  • Estrella Fernández-Giménez,
  • Erney Ramírez-Aportela,
  • Federico Pedro de Isidro-Gómez,
  • Irene Sánchez,
  • James Krieger,
  • José Luis Vilas,
  • Laura del Cano,
  • Marcos Gragera,
  • Mikel Iceta,
  • Marta Martínez,
  • Patricia Losana,
  • Roberto Melero,
  • Roberto Marabini,
  • José María Carazo,
  • Carlos Oscar Sánchez Sorzano

DOI
https://doi.org/10.1017/S2633903X23000132
Journal volume & issue
Vol. 3

Abstract

Read online

Image-processing pipelines require the design of complex workflows combining many different steps that bring the raw acquired data to a final result with biological meaning. In the image-processing domain of cryo-electron microscopy single-particle analysis (cryo-EM SPA), hundreds of steps must be performed to obtain the three-dimensional structure of a biological macromolecule by integrating data spread over thousands of micrographs containing millions of copies of allegedly the same macromolecule. The execution of such complicated workflows demands a specific tool to keep track of all these steps performed. Additionally, due to the extremely low signal-to-noise ratio (SNR), the estimation of any image parameter is heavily affected by noise resulting in a significant fraction of incorrect estimates. Although low SNR and processing millions of images by hundreds of sequential steps requiring substantial computational resources are specific to cryo-EM, these characteristics may be shared by other biological imaging domains. Here, we present Scipion, a Python generic open-source workflow engine specifically adapted for image processing. Its main characteristics are: (a) interoperability, (b) smart object model, (c) gluing operations, (d) comparison operations, (e) wide set of domain-specific operations, (f) execution in streaming, (g) smooth integration in high-performance computing environments, (h) execution with and without graphical capabilities, (i) flexible visualization, (j) user authentication and private access to private data, (k) scripting capabilities, (l) high performance, (m) traceability, (n) reproducibility, (o) self-reporting, (p) reusability, (q) extensibility, (r) software updates, and (s) non-restrictive software licensing.

Keywords