Mining Tandem Mass Spectral Data to Develop a More Accurate Mass Error Model for Peptide Identification

Yan Fu1,2,*, Wen Gao3, Simin He1, Ruixiang Sun1, Hu Zhou4 and Rong Zeng4


1Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China,

2Graduate University of Chinese Academy of Sciences, Beijing, China,

3Peking University, Beijing, China,

4Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China

To appear in PSB 2007


ABSTRACT


The assumption on the mass error distribution of fragment ions plays a crucial role in peptide identification by tandem mass spectra. Previous mass error models are the simplistic uniform or normal distribution with empirically set parameter values. In this paper, we propose a more accurate mass error model, namely conditional normal model, and an iterative parameter learning algorithm. The new model is based on two important observations on the mass error distribution, i.e. the linearity between the mean of mass error and the ion mass, and the log-log linearity between the standard deviation of mass error and the peak intensity. To our knowledge, the latter quantitative relationship has never been reported before. Experimental results demonstrate the effectiveness of our approach in accurately quantifying the mass error distribution and the ability of the new model to improve the accuracy of peptide identification.

Supplementary information: Experimental results on four published datasets.


Acknowledgements

This work was supported by the National Key Basic R&D Program of China (Grant No.: 2002CB713807) and National Key Technologies R&D Program of China (Grant No.: 2004BA711A21). We gratefully acknowledge Dr. Andrew Keller from the Institute for Systems Biology for valuable comments on an early version of the paper. We also thank Prof. Runsheng Chen, Dr. Dongbo Bu, Jingfen Zhang, Quanhu Sheng, Jie Dai and many others from Chinese Academy of Sciences for helpful discussions. We gratefully thank Viveka Mayya from the University of Connecticut Health Center for providing the mass spectra in the UCHC LTQ dataset, John Prince from the University of Texas at Austin for providing the mass spectra in the OPD dataset, Steven Gygi from the Harvard Medical School for providing the mass spectra in the HMS dataset, and Andrew Keller from the Institute for Systems Biology for providing the mass spectra in the ISB dataset.


Back to homepage ...