Cancer Informatics (Jan 2005)
Understanding the Characteristics of Mass Spectrometry Data through the use of Simulation
Abstract
Background Mass spectrometry is actively being used to discover disease-related proteomic patterns in complex mixtures of proteins derived from tissue samples or from easily obtained biological fluids. The potential importance of these clinical applications has made the development of better methods for processing and analyzing the data an active area of research. It is, however, difficult to determine which methods are better without knowing the true biochemical composition of the samples used in the experiments. Methods We developed a mathematical model based on the physics of a simple MALDI-TOF mass spectrometer with time-lag focusing. Using this model, we implemented a statistical simulation of mass spectra. We used the simulation to explore some of the basic operating characteristics of MALDI or SELDI instruments. Results The simulation reproduced several characteristics of actual instruments. We found that the relative mass error is affected by the time discretization of the detector (about 0.01%) and the spread of initial velocities (about 0.1%). The accuracy of calibration based on external standards decays rapidly outside the range spanned by the calibrants. Natural isotope distributions play a major role in broadening peaks associated with individual proteins. The area of a peak is a more accurate measure of its size than the height. Conclusions The model described here is capable of simulating realistic mass spectra. The simulation should become a useful tool for generating spectra where the true inputs are known, allowing researchers to evaluate the performance of new methods for processing and analyzing mass spectra. Availability http://bioinformatics.mdanderson.org/cromwell.html