These patterns can even appear in an equivariant manner, that is, they can pinpoint the position of a PTM within the peptide sequence, but their presence alone can reveal the PTM itself 13. A fragmentation spectrum can contain PTM-specific patterns (for example relations between peaks in a spectrum) that coexist with fragments resulting from the plain peptide sequence 12. However, current approaches are based on engineered features and classical machine learning 10, 11. Some attempted predictions, for example to detect a PTM, are based on only the spectrum itself and are therefore independent of a database. Various sophisticated algorithms exist to cope with the challenges that arise from PTMs and mutations, but these algorithms still require protein databases 6, 7, 8, 9. The latter are essential for various biological processes, and protein phosphorylation is an important PTM that regulates protein function and facilitates cellular signalling 4, 5. The identification of spectra remains challenging, as proteins are often either mutated or carry post-translational modifications (PTMs). For each peptide candidate, a theoretical spectrum is constructed and compared to the acquired spectrum 3. In a search, each acquired mass spectrum is scored against a list of candidate peptides from in silico digested proteins. Instead, the MS wet lab workflow is usually followed by a conventional database search 2. It might seem obvious to apply deep learning to solve various problems with this wealth of data, but the direct application of deep learning to fragmentation spectra has not yet sparked in the community. These peptides are used to study the proteins contained in the biological sample. Each spectrum contains characteristic peak patterns that appear due to the fragmentation of a given peptide. This is because high-throughput proteomic studies generate a vast pool of fragmentation spectra. Publicly available mass spectrometry (MS)-based proteomics data have grown exponentially in terms of the number of datasets and amount of data 1. To show the broad applicability of AHLF, we use transfer learning to also detect cross-linked peptides, as used in protein structure analysis, with an AUC of up to 94%. Furthermore, use of AHLF in rescoring search results increases the number of phosphopeptide identifications by a margin of up to 15.1% at a constant false discovery rate. AHLF increases the area under the receiver operating characteristic curve (AUC) by an average of 9.4% on recent phosphoproteomic data compared with the current state of the art on this task. We demonstrate our approach by detecting post-translational modifications, specifically protein phosphorylation based on only the fragmentation spectrum without a database search. AHLF is interpretable, and we show that peak-level feature importance values and pairwise interactions between peaks are in line with corresponding peptide fragments. Here, to elevate unrestricted learning from spectra, we introduce ‘ad hoc learning of fragmentation’ (AHLF), a deep learning model that is end-to-end trained on 19.2 million spectra from several phosphoproteomic datasets. Commonly, these approaches lack the ability to characterize less studied or even unknown patterns in spectra because of their use of explicit domain knowledge. Currently, only a few deep learning approaches exist that involve peptide fragmentation spectra, which represent partial sequence information of proteins. Mass spectrometry-based proteomics provides a holistic snapshot of the entire protein set of living cells on a molecular level.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |