Two draft maps of the human proteome have been published in the latest issue of Nature. The drafts were produced by two separate international research teams working independently of one another. Using mass-spectrometry to analyze tissue, body fluids, and cells, the teams have catalogued the proteins that are found in a non-diseased state and identified novel proteins expressed from what was previously thought to be non-coding or junk DNA.
Though previous studies have collected protein data sets numbering in the tens of thousands, the current studies catalogue a range that covers more than 80% of the human proteome. The coverage was achieved by analyzing an extensive number and multiple types of tissue. One team, led by Akhilesh Pandey, a researcher at John Hopkins University, analyzed 30 different types of tissue, including fetal tissue and hematopoietic cells. The other team, led by Bernhard Kuster out of the Technische Universität München in Germany, studied 60 tissues, and included body fluids and cancer cell lines in their analysis.
Although the studies were carried out independently, the two groups used similar approaches, and the results complement one other in significant and useful ways. Pandey and colleagues generated a whole new set of data from their samples, analyzing them via high-resolution Fourier-transform mass spectrometry to come up with proteins encoded by over 17,000 genes. Among the newly uncovered protein-coding regions were sequences previously thought to be pseudogenes, non-coding RNAs, and upstream open reading frames. The group covered areas that had been previously unaddressed by proteome analyses, providing a single-origin repository for data against which existing and future data can be measured. Their catalogued data set can be accessed at Human Proteome Map.
In contrast, Kuster and colleagues used available protein analysis data as well as their own, which they obtained via mass-spectrometry. The group identified a number of organ-specific proteins and proteins expressed from sequences that had been previously mislabeled as long intergenic non-coding RNA (lincRNA). In addition, by looking at mRNA levels as well as protein expression, they revealed that the translation rate is conserved. The group identified proteins encoded from over 18,000 genes. Their data set can be accessed at proteomicsdb.org.
Both studies revealed new evidence suggesting that translation is occurring at sites previously dismissed as non-coding. Their studies highlight the importance of proteomics, and its ability to inform the genomics arena. The relevance of the novel proteins and the role played by post-translational modifications are two areas in which researchers hope to expand their work in the near future.
Both studies can be found in the latest issue of “Nature” and can be accessed online at www.nature.com.