SlideShare a Scribd company logo
An overview of the PRIDE ecosystem of
resources and computational tools for
mass spectrometry proteomics data
Dr. Juan Antonio Vizcaíno
EMBL-European Bioinformatics Institute
Hinxton, Cambridge, UK
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Overview
• PRIDE Archive and ProteomeXchange
• PRIDE tools
• Reuse of public proteomics data
• PRIDE added-value resources: PRIDE Cluster and
PRIDE Proteomes
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
What is a proteomics publication in 2016?
• Proteomics studies generate potentially large amounts of
data and results.
• Ideally, a proteomics publication needs to:
• Summarize the results of the study
• Provide supporting information for reliability of any
results reported
• Information in a publication:
• Manuscript
• Supplementary material
• Associated data submitted to a public repository
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
• PRIDE stores mass spectrometry (MS)-based
proteomics data:
• Peptide and protein expression data
(identification and quantification)
• Post-translational modifications
• Mass spectra (raw data and peak lists)
• Technical and biological metadata
• Any other related information
• Full support for tandem MS approaches
PRIDE (PRoteomics IDEntifications) Archive
http://www.ebi.ac.uk/pride/archive
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
ProteomeXchange: A Global, distributed proteomics
database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory raw data deposition
since July 2015
• Goal: Development of a framework to allow standard data submission and
dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.org
New in 2016
Vizcaíno et al., Nat Biotechnol, 2014
Deustch et al., NAR, 2017, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
Peptide Atlas
Receiving repositories
PRIDE
Researcher’s results
Raw data
Metadata
PASSEL
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
Vizcaíno et al., Nat Biotechnol, 2014
Deustch et al., NAR, 2017, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
ProteomeCentral: Centralised portal for all PX
datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
Peptide Atlas
Receiving repositories
PRIDE
Researcher’s results
Raw data
Metadata
PASSEL
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
Vizcaíno et al., Nat Biotechnol, 2014
Deustch et al., NAR, 2017, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
UniProt/
neXtProtPeptide Atlas
Other DBs
Receiving repositories
PRIDE
GPMDBResearcher’s results
Raw data
Metadata
PASSEL
proteomicsDB
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
OmicsDI
Integration with other
omics datasets
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
Vizcaíno et al., Nat Biotechnol, 2014
Deustch et al., NAR, 2017, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PRIDE: Source of MS proteomics data
• PRIDE Archive already provides or
will soon provide MS proteomics
data to other EMBL-EBI resources
such as UniProt, Ensembl and the
EBI Expression Atlas.
http://www.ebi.ac.uk/pride/archive
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PRIDE Archive – over 4,500 datasets from
over 51 countries and 1,700 groups
• USA – 814 datasets
• Germany – 528
• UK – 338
• China – 328
• France – 222
• Netherlands – 175
• Canada - 137
Data volume:
• Total: ~275 TB
• Number of all files: ~560,000
• PXD000320-324: ~ 4 TB
• PXD002319-26 ~2.4 TB
• PXD001471 ~1.6 TB
• 1,973 datasets i.e. 52% of
all are publicly accessible
• ~90% of all
ProteomeXchange datasets
YearSubmissions
All submissions
Complete
PRIDE Archive growth
In the last 12 months: ~165 submitted datasets per month
Top Species studied by at least 100
datasets:
2,010 Homo sapiens
604 Mus musculus
191 Saccharomyces cerevisiae
140 Arabidopsis thaliana
127 Rattus norvegicus
>900 reported taxa in total
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Overview
• PRIDE Archive and ProteomeXchange
• PRIDE tools
• Reuse of public proteomics data
• PRIDE added-value resources: PRIDE Cluster and
PRIDE Proteomes
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PRIDE Components: Data Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
In addition to PRIDE Archive, the PRIDE team develops
and maintains different tools and software libraries to
facilitate the handling and visualisation of MS proteomics
data and the submission process
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012
Perez-Riverol et al., Bioinformatics,
2015
Perez-Riverol et al., MCP, 2016
• PRIDE Inspector - standalone tool to enable visualisation and validation of MS
data.
• Build on top of ms-data-core-api - open source algorithms and libraries for
computational proteomics.
• Supported file formats: mzIdentML, mzML, mzTab (PSI standards), and PRIDE
XML.
• Broad functionality.
https://github.com/PRIDE-Utilities/ms-data-core-api
https://github.com/PRIDE-Toolsuite/pride-inspector
Summary and QC charts Peptide spectra annotation and
visualization
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PX Submission Tool
 Desktop application for data
submissions to ProteomeXchange via
PRIDE
• Implemented in Java 7
• Streamlines the submission process
• Capture mappings between files
• Retain metadata
• Fast file transfer with Aspera (FASP®
transfer technology) – FTP also
available
• Command line option
Submission tool screenshot
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Overview
• PRIDE Archive and ProteomeXchange
• PRIDE tools
• Reuse of public proteomics data
• PRIDE added-value resources: PRIDE Cluster and
PRIDE Proteomes
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Datasets are being reused more and more….
Vaudel et al., Proteomics, 2016
Data download volume for
PRIDE Archive in 2015: 198 TB
0
50
100
150
200
250
2013 2014 2015 2016
Downloads in TBs
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Data sharing in Proteomics
Vaudel et al., Proteomics, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Draft Human proteome papers published in 2014
Wilhelm et al., Nature, 2014 Kim et al., Nature, 2014
•Two independent groups claimed to have produced the
first complete draft of the human proteome by MS.
• Some of their findings are controversial and need further
validation… but generated a lot of discussion and put
proteomics in the spotlight.
•They used many different tissues.
Nature cover 29 May 2014
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Draft Human proteome papers published in 2014
Wilhelm et al., Nature, 2014
•Around 60% of the data used for the
analysis comes from previous
experiments, most of them stored in
proteomics repositories such as
PRIDE/ProteomeXchange, PASSEL or
MassIVE.
•They complement that data with “exotic”
tissues.
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Data sharing in Proteomics
Vaudel et al., Proteomics, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Examples of repurposing in proteogenomics
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Public datasets from different omics: OmicsDI
http://www.ebi.ac.uk/Tools/omicsdi/
• Aims to integrate of ‘omics’ datasets (proteomics,
transcriptomics, metabolomics and genomics at present).
PRIDE
MassIVE
jPOST
PASSEL
GPMDB
ArrayExpress
Expression Atlas
MetaboLights
Metabolomics Workbench
GNPS
EGA
Perez-Riverol et al., Nat Biotechnol, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
OmicsDI: Portal for omics datasets
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
OmicsDI: Portal for omics datasets
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Overview
• PRIDE Archive and ProteomeXchange
• PRIDE tools
• Reuse of public proteomics data
• PRIDE added-value resources: PRIDE Cluster and
PRIDE Proteomes
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Added value resources: PRIDE Cluster
and PRIDE Proteomes
• Condensed and across-data set, QC-filtered view on
PRIDE data.
• PRIDE Cluster: Peptide centric.
• PRIDE Proteomes: Protein centric (identification data)
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Data sharing in Proteomics
Vaudel et al., Proteomics, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PRIDE Cluster
• Provide an aggregated peptide centric view of PRIDE Archive.
• Hypothesis: same peptide will generate similar MS/MS spectra across
experiments.
• New version of spectral clustering algorithm to reliably group spectra
coming from the same peptide.
• Enables QC of peptide-spectrum matches (PSMs). Infer reliable
identifications by comparing submitted identifications of spectra within a
cluster.
 After clustering, a representative spectrum is built for all peptides
consistently identified across different datasets.
 Used to build spectral libraries (for 16 species).
Griss et al., Nat. Methods, 2013
Griss et al., Nat. Methods,
2016
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Example: one perfect cluster
- 880 PSMs give the same peptide ID
- 4 species
- 28 datasets
- Same instruments
http://wwwdev.ebi.ac.uk/pride/cluster/
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PRIDE Proteomes web interface:
identification info Unique/Shared Peptides
Mass spec-based
sequence coverage
PTM detected ( )
Observed
tissues
Biological vs
Sample Prep
PTMs
http://wwwdev.ebi.ac.uk/pride/proteomes/
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Conclusions
• PRIDE Archive and ProteomeXchange have become the
standard platform for public data deposition in proteomics.
• PRIDE Inspector: support for data standards.
• PX submission tool.
• Reuse of public proteomics data is increasing: many
opportunities for data miners.
• OmicsDI: new platform to identify public datasets coming
from different omics technologies (more possibilities for data
reuse!).
• PRIDE Cluster and PRIDE Proteomes.
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Aknowledgements: People
Attila Csordas
Tobias Ternent
Gerhard Mayer (de.NBI)
Johannes Griss
Yasset Perez-Riverol
Manuel Bernal-Llinares
Andrew Jarnuczak
Enrique Perez
Former team members, especially
Rui Wang, Florian Reisinger, Noemi
del Toro, Jose A. Dianes & Henning
Hermjakob
Acknowledgements: The PRIDE Team
All data submitters !!!
@pride_ebi
@proteomexchange
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Questions?
http://www.slideshare.net/JuanAntonioVizcaino

More Related Content

PPTX
Mining the hidden proteome using hundreds of public proteomics datasets
Juan Antonio Vizcaino
 
PPTX
Experiences to learn from the MS proteomics field
Juan Antonio Vizcaino
 
PPTX
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Juan Antonio Vizcaino
 
PPTX
Proteomics public data resources: enabling "big data" analysis in proteomics
Juan Antonio Vizcaino
 
PDF
Pride cluster presentation
Juan Antonio Vizcaino
 
PPTX
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
Juan Antonio Vizcaino
 
PPTX
Mass spectrometry resources at the EBI
Juan Antonio Vizcaino
 
PPTX
PRIDE-ProteomeXchange
Juan Antonio Vizcaino
 
Mining the hidden proteome using hundreds of public proteomics datasets
Juan Antonio Vizcaino
 
Experiences to learn from the MS proteomics field
Juan Antonio Vizcaino
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Juan Antonio Vizcaino
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Juan Antonio Vizcaino
 
Pride cluster presentation
Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
Juan Antonio Vizcaino
 
Mass spectrometry resources at the EBI
Juan Antonio Vizcaino
 
PRIDE-ProteomeXchange
Juan Antonio Vizcaino
 

What's hot (20)

PPTX
Reuse of public proteomics data
Juan Antonio Vizcaino
 
PPTX
Proteomics data standards
Juan Antonio Vizcaino
 
PPTX
Proteomics repositories
Juan Antonio Vizcaino
 
PPTX
ProteomeXchange update HUPO 2016
Juan Antonio Vizcaino
 
PPTX
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Juan Antonio Vizcaino
 
PPTX
Proteomics data standards
Juan Antonio Vizcaino
 
PPTX
ProteomeXchange_and_PRIDE_Semmeting_2015
Juan Antonio Vizcaino
 
PPT
The UK National Chemical Database Service – an integration of commercial and ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
PPT
Royal society of chemistry activities to develop a data repository for chemis...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
PPT
How the InChI identifier is used to underpin our online chemistry databases a...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
PPTX
How to run and maintain a popular biological data repository?
Juan Antonio Vizcaino
 
PPT
The importance of standards for data exchange and interchange on the Royal So...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
PPTX
Investigating Impact Metrics for Performance for the US-EPA National Center f...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
PPTX
Mass Spectrometry Informatics formats in progress
Juan Antonio Vizcaino
 
PPTX
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
Juan Antonio Vizcaino
 
PPTX
Human microbiome project
Juan Antonio Vizcaino
 
PPT
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
PPT
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
PPT
The application of text and data mining to enhance the RSC publication archive
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Reuse of public proteomics data
Juan Antonio Vizcaino
 
Proteomics data standards
Juan Antonio Vizcaino
 
Proteomics repositories
Juan Antonio Vizcaino
 
ProteomeXchange update HUPO 2016
Juan Antonio Vizcaino
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Juan Antonio Vizcaino
 
Proteomics data standards
Juan Antonio Vizcaino
 
ProteomeXchange_and_PRIDE_Semmeting_2015
Juan Antonio Vizcaino
 
The UK National Chemical Database Service – an integration of commercial and ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Royal society of chemistry activities to develop a data repository for chemis...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
How the InChI identifier is used to underpin our online chemistry databases a...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
How to run and maintain a popular biological data repository?
Juan Antonio Vizcaino
 
The importance of standards for data exchange and interchange on the Royal So...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Investigating Impact Metrics for Performance for the US-EPA National Center f...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Mass Spectrometry Informatics formats in progress
Juan Antonio Vizcaino
 
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
Juan Antonio Vizcaino
 
Human microbiome project
Juan Antonio Vizcaino
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The application of text and data mining to enhance the RSC publication archive
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Ad

Similar to An overview of the PRIDE ecosystem of resources and computational tools for mass spectrometry proteomics data (20)

PPTX
PRIDE and ProteomeXchange: Training webinar
Juan Antonio Vizcaino
 
PPTX
Pride and ProteomeXchange
Juan Antonio Vizcaino
 
PPTX
PRIDE and ProteomeXchange
Juan Antonio Vizcaino
 
PDF
ProteomeXchange update
Juan Antonio Vizcaino
 
PPTX
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
Juan Antonio Vizcaino
 
PDF
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
Juan Antonio Vizcaino
 
PPTX
Proteomexchange
Juan Antonio Vizcaino
 
PPTX
Is it feasible to identify novel biomarkers by mining public proteomics data?
Juan Antonio Vizcaino
 
PDF
PRIDE resources and ProteomeXchange
Juan Antonio Vizcaino
 
PPTX
ProteomeXchange: data deposition and data retrieval made easy
Juan Antonio Vizcaino
 
PPTX
Introduction to EBI for Proteomics in ELIXIR
Juan Antonio Vizcaino
 
PPTX
ProteomeXchange update 2017
Juan Antonio Vizcaino
 
PPTX
ProteomeXchange update
Juan Antonio Vizcaino
 
PDF
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
Yasset Perez-Riverol
 
PPTX
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
Juan Antonio Vizcaino
 
PPTX
Do we need to make public our proteomics data?
Yasset Perez-Riverol
 
PPTX
Proteomics repositories
Juan Antonio Vizcaino
 
PDF
Proteomics repositories
Juan Antonio Vizcaino
 
PPTX
Reuse of public data in proteomics
Juan Antonio Vizcaino
 
PPTX
Proteomics repositories
Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange: Training webinar
Juan Antonio Vizcaino
 
Pride and ProteomeXchange
Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange
Juan Antonio Vizcaino
 
ProteomeXchange update
Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
Juan Antonio Vizcaino
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
Juan Antonio Vizcaino
 
Proteomexchange
Juan Antonio Vizcaino
 
Is it feasible to identify novel biomarkers by mining public proteomics data?
Juan Antonio Vizcaino
 
PRIDE resources and ProteomeXchange
Juan Antonio Vizcaino
 
ProteomeXchange: data deposition and data retrieval made easy
Juan Antonio Vizcaino
 
Introduction to EBI for Proteomics in ELIXIR
Juan Antonio Vizcaino
 
ProteomeXchange update 2017
Juan Antonio Vizcaino
 
ProteomeXchange update
Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
Yasset Perez-Riverol
 
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
Juan Antonio Vizcaino
 
Do we need to make public our proteomics data?
Yasset Perez-Riverol
 
Proteomics repositories
Juan Antonio Vizcaino
 
Proteomics repositories
Juan Antonio Vizcaino
 
Reuse of public data in proteomics
Juan Antonio Vizcaino
 
Proteomics repositories
Juan Antonio Vizcaino
 
Ad

More from Juan Antonio Vizcaino (16)

PDF
Reusing and integrating public proteomics data to improve our knowledge of th...
Juan Antonio Vizcaino
 
PPTX
Introduction to the PSI standard data formats
Juan Antonio Vizcaino
 
PDF
Reuse of public proteomics data
Juan Antonio Vizcaino
 
PDF
Introduction to the Proteomics Bioinformatics Course 2018
Juan Antonio Vizcaino
 
PDF
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
Juan Antonio Vizcaino
 
PPTX
PSI-Proteome Informatics update
Juan Antonio Vizcaino
 
PDF
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Juan Antonio Vizcaino
 
PDF
The ELIXIR Proteomics community
Juan Antonio Vizcaino
 
PDF
The ELIXIR Proteomics Community
Juan Antonio Vizcaino
 
PPTX
The ProteomeXchange Consoritum: 2017 update
Juan Antonio Vizcaino
 
PPTX
Reuse of public proteomics data
Juan Antonio Vizcaino
 
PPTX
Introduction to the Proteomics Bioinformatics Course 2017
Juan Antonio Vizcaino
 
PPTX
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
Juan Antonio Vizcaino
 
PPTX
Enabling automated processing and analysis of large-scale proteomics data
Juan Antonio Vizcaino
 
PPTX
The Proteomics Standards Initiative (PSI)
Juan Antonio Vizcaino
 
PPTX
Introduction to the Proteomics Bioinformatics Course 2016
Juan Antonio Vizcaino
 
Reusing and integrating public proteomics data to improve our knowledge of th...
Juan Antonio Vizcaino
 
Introduction to the PSI standard data formats
Juan Antonio Vizcaino
 
Reuse of public proteomics data
Juan Antonio Vizcaino
 
Introduction to the Proteomics Bioinformatics Course 2018
Juan Antonio Vizcaino
 
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
Juan Antonio Vizcaino
 
PSI-Proteome Informatics update
Juan Antonio Vizcaino
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Juan Antonio Vizcaino
 
The ELIXIR Proteomics community
Juan Antonio Vizcaino
 
The ELIXIR Proteomics Community
Juan Antonio Vizcaino
 
The ProteomeXchange Consoritum: 2017 update
Juan Antonio Vizcaino
 
Reuse of public proteomics data
Juan Antonio Vizcaino
 
Introduction to the Proteomics Bioinformatics Course 2017
Juan Antonio Vizcaino
 
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
Juan Antonio Vizcaino
 
Enabling automated processing and analysis of large-scale proteomics data
Juan Antonio Vizcaino
 
The Proteomics Standards Initiative (PSI)
Juan Antonio Vizcaino
 
Introduction to the Proteomics Bioinformatics Course 2016
Juan Antonio Vizcaino
 

Recently uploaded (20)

PDF
Multiwavelength Study of a Hyperluminous X-Ray Source near NGC6099: A Strong ...
Sérgio Sacani
 
PPTX
Reticular formation_nuclei_afferent_efferent
muralinath2
 
PPTX
The Toxic Effects of Aflatoxin B1 and Aflatoxin M1 on Kidney through Regulati...
OttokomaBonny
 
PPT
1a. Basic Principles of Medical Microbiology Part 2 [Autosaved].ppt
separatedwalk
 
PPT
Grade_9_Science_Atomic_S_t_r_u_cture.ppt
QuintReynoldDoble
 
PDF
Even Lighter Than Lightweiht: Augmenting Type Inference with Primitive Heuris...
ESUG
 
PPTX
Role of GIS in precision farming.pptx
BikramjitDeuri
 
PPTX
General Characters and Classification of Su class Apterygota.pptx
Dr Showkat Ahmad Wani
 
PPTX
first COT (MATH).pptxCSAsCNKHPHCouAGSCAUO:GC/ZKVHxsacba
DitaSIdnay
 
PDF
Identification of unnecessary object allocations using static escape analysis
ESUG
 
PPTX
Pharmacognosy: ppt :pdf :pharmacognosy :
Vishnukanchi darade
 
PPTX
Quality control test for plastic & metal.pptx
shrutipandit17
 
PPTX
Hepatopulmonary syndrome power point presentation
raknasivar1997
 
PDF
An Analysis of Inline Method Refactoring
ESUG
 
PDF
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
ESUG
 
PDF
The Cosmic Symphony: How Photons Shape the Universe and Our Place Within It
kutatomoshi
 
PPTX
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
PDF
Gamifying Agent-Based Models in Cormas: Towards the Playable Architecture for...
ESUG
 
PDF
study of microbiologically influenced corrosion of 2205 duplex stainless stee...
ahmadfreak180
 
PPTX
Hydrocarbons Pollution. OIL pollutionpptx
AkCreation33
 
Multiwavelength Study of a Hyperluminous X-Ray Source near NGC6099: A Strong ...
Sérgio Sacani
 
Reticular formation_nuclei_afferent_efferent
muralinath2
 
The Toxic Effects of Aflatoxin B1 and Aflatoxin M1 on Kidney through Regulati...
OttokomaBonny
 
1a. Basic Principles of Medical Microbiology Part 2 [Autosaved].ppt
separatedwalk
 
Grade_9_Science_Atomic_S_t_r_u_cture.ppt
QuintReynoldDoble
 
Even Lighter Than Lightweiht: Augmenting Type Inference with Primitive Heuris...
ESUG
 
Role of GIS in precision farming.pptx
BikramjitDeuri
 
General Characters and Classification of Su class Apterygota.pptx
Dr Showkat Ahmad Wani
 
first COT (MATH).pptxCSAsCNKHPHCouAGSCAUO:GC/ZKVHxsacba
DitaSIdnay
 
Identification of unnecessary object allocations using static escape analysis
ESUG
 
Pharmacognosy: ppt :pdf :pharmacognosy :
Vishnukanchi darade
 
Quality control test for plastic & metal.pptx
shrutipandit17
 
Hepatopulmonary syndrome power point presentation
raknasivar1997
 
An Analysis of Inline Method Refactoring
ESUG
 
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
ESUG
 
The Cosmic Symphony: How Photons Shape the Universe and Our Place Within It
kutatomoshi
 
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
Gamifying Agent-Based Models in Cormas: Towards the Playable Architecture for...
ESUG
 
study of microbiologically influenced corrosion of 2205 duplex stainless stee...
ahmadfreak180
 
Hydrocarbons Pollution. OIL pollutionpptx
AkCreation33
 

An overview of the PRIDE ecosystem of resources and computational tools for mass spectrometry proteomics data

  • 1. An overview of the PRIDE ecosystem of resources and computational tools for mass spectrometry proteomics data Dr. Juan Antonio Vizcaíno EMBL-European Bioinformatics Institute Hinxton, Cambridge, UK
  • 2. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Overview • PRIDE Archive and ProteomeXchange • PRIDE tools • Reuse of public proteomics data • PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes
  • 3. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 What is a proteomics publication in 2016? • Proteomics studies generate potentially large amounts of data and results. • Ideally, a proteomics publication needs to: • Summarize the results of the study • Provide supporting information for reliability of any results reported • Information in a publication: • Manuscript • Supplementary material • Associated data submitted to a public repository
  • 4. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 • PRIDE stores mass spectrometry (MS)-based proteomics data: • Peptide and protein expression data (identification and quantification) • Post-translational modifications • Mass spectra (raw data and peak lists) • Technical and biological metadata • Any other related information • Full support for tandem MS approaches PRIDE (PRoteomics IDEntifications) Archive http://www.ebi.ac.uk/pride/archive Martens et al., Proteomics, 2005 Vizcaíno et al., NAR, 2016
  • 5. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 ProteomeXchange: A Global, distributed proteomics database PASSEL (SRM data) PRIDE (MS/MS data) MassIVE (MS/MS data) Raw ID/Q Meta jPOST (MS/MS data) Mandatory raw data deposition since July 2015 • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. http://www.proteomexchange.org New in 2016 Vizcaíno et al., Nat Biotechnol, 2014 Deustch et al., NAR, 2017, in press
  • 6. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 ProteomeCentral Metadata / Manuscript Raw Data Results Journals Peptide Atlas Receiving repositories PRIDE Researcher’s results Raw data Metadata PASSEL Research groups Reanalysis of datasets MassIVE jPOST MS/MS data (as complete submissions) Any other workflow (mainly partial submissions) DATASETS SRM data Reprocessed results MassIVE ProteomeXchange data workflow Vizcaíno et al., Nat Biotechnol, 2014 Deustch et al., NAR, 2017, in press
  • 7. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 ProteomeCentral: Centralised portal for all PX datasets http://proteomecentral.proteomexchange.org/cgi/GetDataset
  • 8. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 ProteomeCentral Metadata / Manuscript Raw Data Results Journals Peptide Atlas Receiving repositories PRIDE Researcher’s results Raw data Metadata PASSEL Research groups Reanalysis of datasets MassIVE jPOST MS/MS data (as complete submissions) Any other workflow (mainly partial submissions) DATASETS SRM data Reprocessed results MassIVE ProteomeXchange data workflow Vizcaíno et al., Nat Biotechnol, 2014 Deustch et al., NAR, 2017, in press
  • 9. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 ProteomeCentral Metadata / Manuscript Raw Data Results Journals UniProt/ neXtProtPeptide Atlas Other DBs Receiving repositories PRIDE GPMDBResearcher’s results Raw data Metadata PASSEL proteomicsDB Research groups Reanalysis of datasets MassIVE jPOST MS/MS data (as complete submissions) Any other workflow (mainly partial submissions) DATASETS OmicsDI Integration with other omics datasets SRM data Reprocessed results MassIVE ProteomeXchange data workflow Vizcaíno et al., Nat Biotechnol, 2014 Deustch et al., NAR, 2017, in press
  • 10. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PRIDE: Source of MS proteomics data • PRIDE Archive already provides or will soon provide MS proteomics data to other EMBL-EBI resources such as UniProt, Ensembl and the EBI Expression Atlas. http://www.ebi.ac.uk/pride/archive
  • 11. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PRIDE Archive – over 4,500 datasets from over 51 countries and 1,700 groups • USA – 814 datasets • Germany – 528 • UK – 338 • China – 328 • France – 222 • Netherlands – 175 • Canada - 137 Data volume: • Total: ~275 TB • Number of all files: ~560,000 • PXD000320-324: ~ 4 TB • PXD002319-26 ~2.4 TB • PXD001471 ~1.6 TB • 1,973 datasets i.e. 52% of all are publicly accessible • ~90% of all ProteomeXchange datasets YearSubmissions All submissions Complete PRIDE Archive growth In the last 12 months: ~165 submitted datasets per month Top Species studied by at least 100 datasets: 2,010 Homo sapiens 604 Mus musculus 191 Saccharomyces cerevisiae 140 Arabidopsis thaliana 127 Rattus norvegicus >900 reported taxa in total
  • 12. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Overview • PRIDE Archive and ProteomeXchange • PRIDE tools • Reuse of public proteomics data • PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes
  • 13. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PRIDE Components: Data Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML In addition to PRIDE Archive, the PRIDE team develops and maintains different tools and software libraries to facilitate the handling and visualisation of MS proteomics data and the submission process
  • 14. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PRIDE Inspector Toolsuite Wang et al., Nat. Biotechnology, 2012 Perez-Riverol et al., Bioinformatics, 2015 Perez-Riverol et al., MCP, 2016 • PRIDE Inspector - standalone tool to enable visualisation and validation of MS data. • Build on top of ms-data-core-api - open source algorithms and libraries for computational proteomics. • Supported file formats: mzIdentML, mzML, mzTab (PSI standards), and PRIDE XML. • Broad functionality. https://github.com/PRIDE-Utilities/ms-data-core-api https://github.com/PRIDE-Toolsuite/pride-inspector Summary and QC charts Peptide spectra annotation and visualization
  • 15. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PX Submission Tool  Desktop application for data submissions to ProteomeXchange via PRIDE • Implemented in Java 7 • Streamlines the submission process • Capture mappings between files • Retain metadata • Fast file transfer with Aspera (FASP® transfer technology) – FTP also available • Command line option Submission tool screenshot
  • 16. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Overview • PRIDE Archive and ProteomeXchange • PRIDE tools • Reuse of public proteomics data • PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes
  • 17. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Datasets are being reused more and more…. Vaudel et al., Proteomics, 2016 Data download volume for PRIDE Archive in 2015: 198 TB 0 50 100 150 200 250 2013 2014 2015 2016 Downloads in TBs
  • 18. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Data sharing in Proteomics Vaudel et al., Proteomics, 2016
  • 19. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Draft Human proteome papers published in 2014 Wilhelm et al., Nature, 2014 Kim et al., Nature, 2014 •Two independent groups claimed to have produced the first complete draft of the human proteome by MS. • Some of their findings are controversial and need further validation… but generated a lot of discussion and put proteomics in the spotlight. •They used many different tissues. Nature cover 29 May 2014
  • 20. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Draft Human proteome papers published in 2014 Wilhelm et al., Nature, 2014 •Around 60% of the data used for the analysis comes from previous experiments, most of them stored in proteomics repositories such as PRIDE/ProteomeXchange, PASSEL or MassIVE. •They complement that data with “exotic” tissues.
  • 21. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Data sharing in Proteomics Vaudel et al., Proteomics, 2016
  • 22. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Examples of repurposing in proteogenomics
  • 23. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Public datasets from different omics: OmicsDI http://www.ebi.ac.uk/Tools/omicsdi/ • Aims to integrate of ‘omics’ datasets (proteomics, transcriptomics, metabolomics and genomics at present). PRIDE MassIVE jPOST PASSEL GPMDB ArrayExpress Expression Atlas MetaboLights Metabolomics Workbench GNPS EGA Perez-Riverol et al., Nat Biotechnol, in press
  • 24. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 OmicsDI: Portal for omics datasets
  • 25. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 OmicsDI: Portal for omics datasets
  • 26. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Overview • PRIDE Archive and ProteomeXchange • PRIDE tools • Reuse of public proteomics data • PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes
  • 27. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Added value resources: PRIDE Cluster and PRIDE Proteomes • Condensed and across-data set, QC-filtered view on PRIDE data. • PRIDE Cluster: Peptide centric. • PRIDE Proteomes: Protein centric (identification data)
  • 28. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Data sharing in Proteomics Vaudel et al., Proteomics, 2016
  • 29. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PRIDE Cluster • Provide an aggregated peptide centric view of PRIDE Archive. • Hypothesis: same peptide will generate similar MS/MS spectra across experiments. • New version of spectral clustering algorithm to reliably group spectra coming from the same peptide. • Enables QC of peptide-spectrum matches (PSMs). Infer reliable identifications by comparing submitted identifications of spectra within a cluster.  After clustering, a representative spectrum is built for all peptides consistently identified across different datasets.  Used to build spectral libraries (for 16 species). Griss et al., Nat. Methods, 2013 Griss et al., Nat. Methods, 2016
  • 30. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Example: one perfect cluster - 880 PSMs give the same peptide ID - 4 species - 28 datasets - Same instruments http://wwwdev.ebi.ac.uk/pride/cluster/
  • 31. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PRIDE Proteomes web interface: identification info Unique/Shared Peptides Mass spec-based sequence coverage PTM detected ( ) Observed tissues Biological vs Sample Prep PTMs http://wwwdev.ebi.ac.uk/pride/proteomes/
  • 32. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Conclusions • PRIDE Archive and ProteomeXchange have become the standard platform for public data deposition in proteomics. • PRIDE Inspector: support for data standards. • PX submission tool. • Reuse of public proteomics data is increasing: many opportunities for data miners. • OmicsDI: new platform to identify public datasets coming from different omics technologies (more possibilities for data reuse!). • PRIDE Cluster and PRIDE Proteomes.
  • 33. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Aknowledgements: People Attila Csordas Tobias Ternent Gerhard Mayer (de.NBI) Johannes Griss Yasset Perez-Riverol Manuel Bernal-Llinares Andrew Jarnuczak Enrique Perez Former team members, especially Rui Wang, Florian Reisinger, Noemi del Toro, Jose A. Dianes & Henning Hermjakob Acknowledgements: The PRIDE Team All data submitters !!! @pride_ebi @proteomexchange
  • 34. Juan A. Vizcaíno [email protected] Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Questions? http://www.slideshare.net/JuanAntonioVizcaino