pyPcazip: A PCA-based toolkit for compression and analysis of molecular simulation data
Ardita Shkurti,
Ramon Goni,
Pau Andrio,
Elena Breitmoser,
Iain Bethune,
Modesto Orozco,
Charles A. Laughton
Affiliations
Ardita Shkurti
School of Pharmacy and Centre for Biomolecular Sciences, The University of Nottingham, University Park, Nottingham, NG7 2RD, United Kingdom
Ramon Goni
Barcelona Supercomputing Center, Jordi Girona 31, Barcelona 08034, Spain; Joint BSCCRG-IRB Program in Computational Biology, Barcelona, Spain
Pau Andrio
Barcelona Supercomputing Center, Jordi Girona 31, Barcelona 08034, Spain; Joint BSCCRG-IRB Program in Computational Biology, Barcelona, Spain
Elena Breitmoser
Edinburgh Parallel Computing Centre (EPCC), The University of Edinburgh, James Clerk Maxwell Building, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, United Kingdom
Iain Bethune
Edinburgh Parallel Computing Centre (EPCC), The University of Edinburgh, James Clerk Maxwell Building, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, United Kingdom
Modesto Orozco
Barcelona Supercomputing Center, Jordi Girona 31, Barcelona 08034, Spain; Joint BSCCRG-IRB Program in Computational Biology, Barcelona, Spain; Institute for Research in Biomedicine (IRB Barcelona), Baldiri Reixach 10-12, 08028 Barcelona, Spain; Department of Biochemistry and Molecular Biology, University of Barcelona, 08028 Barcelona, Spain
Charles A. Laughton
School of Pharmacy and Centre for Biomolecular Sciences, The University of Nottingham, University Park, Nottingham, NG7 2RD, United Kingdom; Corresponding author.
The biomolecular simulation community is currently in need of novel and optimised software tools that can analyse and process, in reasonable timescales, the large generated amounts of molecular simulation data. In light of this, we have developed and present here pyPcazip: a suite of software tools for compression and analysis of molecular dynamics (MD) simulation data. The software is compatible with trajectory file formats generated by most contemporary MD engines such as AMBER, CHARMM, GROMACS and NAMD, and is MPI parallelised to permit the efficient processing of very large datasets. pyPcazip is a Unix based open-source software (BSD licenced) written in Python. Keywords: Data analysis, Principal component analysis, Molecular dynamics, Molecular simulation