The Global Proteome Machine Organization
   The Global Proteome Machine
The home of proteomics crowdsourced "Big Data". The Global Proteome Machine is an experimental bioinformatics project. It involves research on the generation of information and knowledge from proteomics data and its reuse to improve the utility of proteomics for biomedical and biological research.
The Global Proteome Machine Database
   Latest GPM News
Data set of the week: (2012/01/22)
Characterization of the Asia Oceania Human Proteome Organisation Membrane Proteomics Initiative Standard using SDS-PAGE shotgun proteomics.
Overall rating:
This data set consisted of 6 experiments from LC/MS/MS runs. The data was published by Peng L, Kapp EA, McLauchlan D, and Jordan TW. in Proteomics 2011 11:4376-84 (PubMed).
These experiments provide insight into how straightforward it has become to identify membrane proteins. Using a fairly simple sample preparation method and LC/MS/MS with an LTQ instrument, the results show that it is possible to easily identify large numbers of membrane proteins. It is still common for people to suggest that membrane proteins are "difficult" using proteomics techniques. These results show that they are really no more difficult than any other class of protein, so long as they can be kept in solution long enough to be digested.
RFC GPM-2011.12.14 adopted (2012.01.17)
The Request-For-Comments GPM-2011.12.14 entitled "Nomenclature for the description of protein sequence modifications" has been adopted by the GPM. The RFC describes a systematic method for recording modifications associated with protein sequences, which can also be used to formulate queries about protein modifications to any compliant database system. GPM and GPMDB will be modified over then next few months to be compliant with this new specification. We'd like to thank everyone who sent in comments, almost all of which ended up in the final version of the document.
Data set of the week: (2012/01/15)
Deep proteome and transcriptome mapping of a human cancer cell line.
Overall rating:
This data set consisted of 164 experiments from multidimensional LC/MS/MS runs. The data was published by Nagaraj N, Wisniewski JR, Geiger T, Cox J, Kircher M, Kelso J, Pääbo S, and Mann M. in Mol Syst Biol. 2011 7:548 (PubMed).
This data set is an extensive investigation of how many peptides can be identified from the limited proteome of a single human cell line using a combination of straight-forward LC/MS/MS methods, multidimensional chromatography and multiple proteases, adding in high resolution MS/MS via HCD, and doing careful, consistently state-of-the-art lab work. For the large number of groups that use HeLa cells, this work should serve as a reference for what can be seen and what sort of experiment should be done to see it. For anyone interested in bioinformatics and algorithm development, the scale (> 200,000 protein identifications) and precision of the work makes it an excellent example for trying out new ideas. It is also an excellent raw data set to find novel post-translational modifications, splice variants, viral contaminants and amino acid polymorphisms.
Data set of the week: (2012/01/08)
iPRG-2011: Study Materials for Identification of Electron Transfer Dissociation (ETD) Mass Spectra.
Overall rating:
This data set consisted of 1 SCX fraction LCMS/MS run on a Thermo Orbitrap-LTQ hybrid instrument. The data was made available on TRANCHE by the ABRF iPRG group Robert J Chalkley, Nuno Bandeira, John Cottrell, Eric Deutsch, Eugene A. Kapp, Henry H. Lam, W. Hayes McDonald and Thomas Neubert and has been described on the iPRG web site.
This rather oddball dataset provides more insight into the "chilli-cook-off" mentality associated with evaluating bioinformatics algorithms than it does into the current real-world problems in biomedical research. Tests of this sort can be useful when their goals are to provide feedback to algorithm & user interface designers and to inform users of the characteristics of algorithm performance. It is questionable as to whether any of such aims were achieved by analyzing this data set.
The data was artificially removed from context (only one of 21 SCX fractions was made available). The sample preparation methods used generated very high levels of non-enzymatic cleavage (22% of observable peptides), unusually high levels of asparagine deamination (48% of N-containing peptides) and peptide N-terminal glutamine cyclization (88% of peptides with an N-terminal Q). The mass measurements had large parent ion and fragment ion systematic errors (+5 ppm and -0.25 Da respectively) and standard deviations (4 ppm and 0.3 Da). The proteins in the sample were heavily skewed towards the cytosolic proteins and the added human sequence standard proteins (Sigma UPS). The lack of the other 20 fractions made it impossible to draw any conclusions about the relative observability of the added UPS proteins (and the ribosomal E. coli protein contaminants in the UPS preparation). It was very unclear why such a complex, poorly controlled sample/measurement combination was used to test algorithms and so little information about the true character of the sample was provided to the participating groups. This hidden complexity resulted in more of an examination of the detective abilities of the groups than a useful test of the algorithms.
New Editions of the Human and Mouse Proteome Guides Released (2012.01.03)
The latest edition (2012.01.01) of both the GPM Homo sapiens and Mus musculus Proteome Guides have been been made available. The Guides are the results of an automated curation of the >200 million human and >50 million mouse peptide identifications in GPMDB. The Guides use ENSEMBL v. 62 protein sequences and their chromosome coordinates are aligned to the human GRCh37 genome and mouse NCBIM37 genome builds, respectively. The Guides are available either as spreadsheets or in HTML format and they may be downloaded either from the links above or the GPM Annotation Project ftp server.
Data set of the week: (2012/01/01)
Proteomic Analysis of a Pleistocene Mammoth Femur Reveals More than One Hundred Ancient Bone Proteins.
Overall rating:
This data set consisted of 4 data sets constructed from several different types of experiment. The data was published by Cappellini E, Jensen LJ, Szklarczyk D, Ginolhac A, da Fonseca RA, Stafford TW, Holen SR, Collins MJ, Orlando L, Willerslev E, Gilbert MT, and Olsen JV. in J Proteome Res. 2011 Dec 14 (PubMed).
This data was a truly amazing example of what can be obtained using samples that have simply sat around outside for 43,000 years. The preservation of the detectable peptides was unexpectedly good. The experiments were state-of-the-art at all levels and the data should be examined extensively by any group interested in detecting amino acid polymorphisms associated with evolutionary change. The analysis in the original paper was correct at the top level (the proteins detected) but was less well done at the level of amino acid polymorphisms and side chain modifications. There are several more publications' worth of information in this extraordinary data.
Copyright © 2011, The Global Proteome Machine Organization. Privacy Statement