The predictive power of data-processing statistics

Melanie Vollmar; James M. Parkhurst; Dominic Jaques; Arnaud Baslé; Garib N. Murshudov; David G. Waterman; Gwyndaf Evans

doi:10.1107/S2052252520000895

IUCrJ (Mar 2020)

The predictive power of data-processing statistics

Melanie Vollmar,
James M. Parkhurst,
Dominic Jaques,
Arnaud Baslé,
Garib N. Murshudov,
David G. Waterman,
Gwyndaf Evans

Affiliations

Melanie Vollmar: Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot OX11 0DE, England
James M. Parkhurst: Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot OX11 0DE, England
Dominic Jaques: Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot OX11 0DE, England
Arnaud Baslé: Institute for Cell and Molecular Biosciences, Newcastle University, Framlington Place, Newcastle upon Tyne NE2 1HH, England
Garib N. Murshudov: MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, England
David G. Waterman: Science Technology and Facilities Council, Rutherford Appleton Laboratory, Didcot OX11 0FA, England
Gwyndaf Evans: Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot OX11 0DE, England

DOI: https://doi.org/10.1107/S2052252520000895
Journal volume & issue: Vol. 7, no. 2
pp. 342 – 354

Abstract

Read online

This study describes a method to estimate the likelihood of success in determining a macromolecular structure by X-ray crystallography and experimental single-wavelength anomalous dispersion (SAD) or multiple-wavelength anomalous dispersion (MAD) phasing based on initial data-processing statistics and sample crystal properties. Such a predictive tool can rapidly assess the usefulness of data and guide the collection of an optimal data set. The increase in data rates from modern macromolecular crystallography beamlines, together with a demand from users for real-time feedback, has led to pressure on computational resources and a need for smarter data handling. Statistical and machine-learning methods have been applied to construct a classifier that displays 95% accuracy for training and testing data sets compiled from 440 solved structures. Applying this classifier to new data achieved 79% accuracy. These scores already provide clear guidance as to the effective use of computing resources and offer a starting point for a personalized data-collection assistant.

Published in IUCrJ

ISSN: 2052-2525 (Online)
Publisher: International Union of Crystallography
Country of publisher: United Kingdom
LCC subjects: Science: Chemistry: Crystallography
Website: https://journals.iucr.org/m/

About the journal

Abstract

Keywords