SlideShare a Scribd company logo
European Life Sciences Infrastructure for Biological Information
www.elixir-europe.org
Rafael C Jimenez
ELIXIR CTO
EDUAT conference, 25 September 2014
Proteomics repositories integration
using EUDAT resources
Data submissions
2
Submissions
raw data
processed data
metadata
Data
repository
Search
Integration
Noble WS, MacCoss MJ (2012) Computational and Statistical Analysis of Protein Mass Spectrometry Data. PLoS Comput Biol 8(1):
e1002296. doi:10.1371/journal.pcbi.1002296
http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002296
Overview of shotgun proteomics data production
MKKKNIYSIRKLGVG
IASVTLGTLLISG
GVTPAANAAQHD
FYQVLNMPNLNADQ
RNGFIQSLK
DDPSQSANVKLN
4
Peptide sequences
Raw data Process data
Metadata
Data examples
4
Raw data Process data Metadata
DNA
Human
Liver
Mitochondria
W. Smith
…
Peptide
Mouse
Heart
Nucleus
J. Heinz
…
LPISASHSSK…
TTGTTATCCG…
… … …
Proteomics data in PRIDE
5
~85% raw data
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
NeXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL
(SRM data)
PRIDE
(MS/MS data)
GPMDB
Researcher’s results Reprocessed results Raw data* Metadata
ProteomeXchange
Vizcaíno et al., Nature Biotechnology, 2014
• Framework to enable standard data submission and
dissemination pipelines between the main existing
proteomics resources.
7 Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2013
PRIDE (PRoteomics IDEntifications) database
Mass spectrometry
Origin:
152 USA
108 Germany
67 United Kingdom
53 Switzerland
48 Netherlands
42 China
42 Canada
41 France
36 Spain
33 Belgium
25 Australia
23 Sweden
17 Japan
16 Denmark
13 Norway
12 Finland
12 India
12 Taiwan
10 Italy
9 Republic of Korea
8 Austria
8 Ireland
8 Brazil
7 Singapore
5 Israel
5 Russia …
Type:
273 PRIDE complete
501 PRIDE partial
47 PeptideAtlas/PASSEL complete
Access:
38.3% PRIDE public
5.3% PASSEL public
56% PRIDE private
0.4% PASSEL private
Data volume:
Total: >40 TB
Number of all files: >120,000
PXD000320-324: ~ 5 TB
PXD000065: ~ 1.4TB
Top Species studied by at least 8
datasets:
381 Homo sapiens
100 Mus musculus
31 Arabidopsis thaliana
26 Saccharomyces cerevisiae
16 Escherichia coli
14 Rattus norvegicus
12 Mycobacterium tuberculosis
11 Drosophila melanogaster
~ 215 species in total
Submissions/year:
2012: 102
2013: 527
2014: 192
Pilot evolution
• Use EUDAT
• Replication of ELIXIR data in EUDAT data centers
• Delegation of ELIXIR data in EUDAT data centers
• Adopt EUDAT
• Replication of ELIXIR data in ELIXIR data centers using EUDAT
technology
9
Replication of ELIXIR data in EUDAT data
centers
10
Central repository Data storage centers
Meta
data
Raw
Data
Meta
data
Results
Raw
Data
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
NeXtProt
Peptide Atlas
Other DBs
Receiving repositories
GPMDB
Researcher’s results Reprocessed results Raw data* Metadata
Vizcaíno et al., Nature Biotechnology, 2014
Raw Data*
PASSEL
(SRM data)
PRIDE
(MS/MS data)
Replication of ELIXIR data in EUDAT data
centers
Delegation of ELIXIR data in EUDAT data
centers
12
Central repository Data storage centers
Meta
data
Raw
Data
Meta
data
Results
Raw
Data
ProteomeCentral
Metadata /
Manuscript
Raw Data*
Results
Journals
UniProt/
NeXtProt
Peptide Atlas
Other DBs
Receiving repositories
GPMDB
Researcher’s results Reprocessed results Raw data* Metadata
Vizcaíno et al., Nature Biotechnology, 2014
Raw Data*
PRIDE
(MS/MS data)
PASSEL
(SRM data)
Delegation of ELIXIR data in EUDAT data
centers
Replication of ELIXIR data in ELIXIR data centers
using EUDAT technology
14
National proteomics centers
Meta
data
Results
Raw
Data
Central repository
Meta
data
Results
Raw
Data
Plans
15
National proteomics centers
Meta
data
Results
Raw
Data
Central repository
Meta
data
Results
Raw
Data
Data storage centers
Meta
data
Raw
Data
1.- ELXIR replication
2.- EUDAT replication
Plans
16
National proteomics centers
Meta
data
Results
Raw
Data
Central repository
Meta
data
Results
Raw
Data
Data storage centers
Meta
data
Raw
Data
3.- delegation
ELIXIR Pilot action
17
EUDAT services
18
File sharing model
19
CSC
BILS
Site B
Site C
EUDAT CDIELIXIR
B2SAFE
B2SAFE
B2SAFE
B2SAFE
PRIDE
EMBL-EBI
Pilot – EUDAT adoption: ELIXIR replication
20
CSC
BILS
Site B
Site C
EUDAT CDIELIXIR
B2SAFE
B2SAFE
B2SAFE
B2SAFE
PRIDE
EMBL-EBI
Central repositoryNational proteomics centers
Meta
data
Results
Raw
Data
Meta
data
Results
Raw
Data
PIDs
21
ELIXIR
community center
ELIXIR
Data center 1
EUDAT
Data center 1
CSCPRIDEBILS
Status
• BILS
• Migrating from existing Swestore dCache to iRODS
• Testing compatibility with B2SAFE
• Latest iRDOS not compatible with B2SAFE?
• PRIDE
• iRODS service installed
• B2SAFE module have been deployed at EMBL-EBI (PRIDE)
• Test B2SAFE replication PRIDE -> CSC
• DOI for datasets
• PID for dataset files
• Web service to associate datasets to dataset files
22
Status
In progress
• Handle System Registration
• Test requests of EPIC/EUDAT identifiers
Open questions
• BILS local PIDs?
• Sync back from PRIDE to BILS for modifications/additions at PRIDE?
• Data push or pull model?
• Replication of process data requires previous validation
23
Participants
EUDAT/CSC
• Jani Heikkinen
• Damien Lecarpentier
• Johannes Reetz
EMBL-EBI/systems
• Andy Jenkinson
• Steven Newhouse
24
BILS
• Mikael Borg
• Fredrik Levander
• Bengt Persson
EMBL-EBI/PRIDE
• JuanAntonioVizcaíno
• RuiWang
• Henning Hermjakob
ELIXIR Hub
• Rafael C Jimenez
European LifeSciences Infrastructure for Biological Information
www.elixir-europe.org
Thank you for your attention
Delegation of raw data
26
processed data
metadata
Data
repository
PID
Submissions
Search
Integration
27
National proteomics centers
Meta
data
Results
Raw
Data
Central repository
Meta
data
Results
Raw
Data
Data storage centers
Meta
data
Raw
Data
National proteomics centers
Meta
data
Results
Raw
Data
Central repository
Meta
data
Results
Raw
Data
Data storage centers
Meta
data
Raw
Data

More Related Content

PPTX
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Rafael C. Jimenez
 
PDF
Tools and database of NCBI
Santosh Kumar Sahoo
 
PDF
Araport Data Integration - 2015 UMD Minisymposium
Vivek Krishnakumar
 
PDF
Bioinformatics databases: Current Trends and Future Perspectives
University of Malaya
 
PPT
Literature Based Framework for Semantic Descriptions of e-Science resources
Hammad Afzal
 
PPTX
Biological database by kk sahu
KAUSHAL SAHU
 
PPT
Bioinformatics Databases
cschlos2
 
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Rafael C. Jimenez
 
Tools and database of NCBI
Santosh Kumar Sahoo
 
Araport Data Integration - 2015 UMD Minisymposium
Vivek Krishnakumar
 
Bioinformatics databases: Current Trends and Future Perspectives
University of Malaya
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Hammad Afzal
 
Biological database by kk sahu
KAUSHAL SAHU
 
Bioinformatics Databases
cschlos2
 

What's hot (20)

PPTX
Presentation on Biological database By Elufer Akram @ University Of Science ...
Elufer Akram
 
PPT
Biological Databases
Shweta Kagliwal
 
PPTX
Pathways and genomes databases in bioinformatics
sarwat bashir
 
PPT
Intro bioinfo
Vinitha Nair
 
PPTX
Features of biological databases
Charu Sharma
 
PPT
Bioinformatica 06-10-2011-t2-databases
Prof. Wim Van Criekinge
 
PDF
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET
 
PPTX
Databases in Bioinformatics
Meghaj Mallick
 
PPTX
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
ExternalEvents
 
PDF
Ijetcas14 325
Iasir Journals
 
PPTX
databases in bioinformatics
nadeem akhter
 
PPTX
Major resources of bioinformatics 2
Mohd Affan
 
PPTX
Major databases in bioinformatics
Vidya Kalaivani Rajkumar
 
PPT
Primary and secondary database
KAUSHAL SAHU
 
PPT
The uni prot knowledgebase
Kew Sama
 
PPTX
Introduction to NCBI
geetikaJethra
 
PDF
E-Utilities
mkim8
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Elufer Akram
 
Biological Databases
Shweta Kagliwal
 
Pathways and genomes databases in bioinformatics
sarwat bashir
 
Intro bioinfo
Vinitha Nair
 
Features of biological databases
Charu Sharma
 
Bioinformatica 06-10-2011-t2-databases
Prof. Wim Van Criekinge
 
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET
 
Databases in Bioinformatics
Meghaj Mallick
 
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
ExternalEvents
 
Ijetcas14 325
Iasir Journals
 
databases in bioinformatics
nadeem akhter
 
Major resources of bioinformatics 2
Mohd Affan
 
Major databases in bioinformatics
Vidya Kalaivani Rajkumar
 
Primary and secondary database
KAUSHAL SAHU
 
The uni prot knowledgebase
Kew Sama
 
Introduction to NCBI
geetikaJethra
 
E-Utilities
mkim8
 
Ad

Similar to Proteomics repositories integration using EUDAT resources (20)

PPT
ELIXIR Pilot Actions launched in 2014: Integration of BILS-ProteomeXchange us...
Juan Antonio Vizcaino
 
PPTX
PRIDE-ProteomeXchange
Juan Antonio Vizcaino
 
PPTX
Is it feasible to identify novel biomarkers by mining public proteomics data?
Juan Antonio Vizcaino
 
PPTX
Introduction to EBI for Proteomics in ELIXIR
Juan Antonio Vizcaino
 
PPTX
Proteomics repositories
Juan Antonio Vizcaino
 
PPTX
An overview of the PRIDE ecosystem of resources and computational tools for m...
Juan Antonio Vizcaino
 
PPTX
PRIDE and ProteomeXchange: Training webinar
Juan Antonio Vizcaino
 
PPTX
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Juan Antonio Vizcaino
 
PPTX
Reuse of public proteomics data
Juan Antonio Vizcaino
 
PDF
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
Juan Antonio Vizcaino
 
PPTX
Pride and ProteomeXchange
Juan Antonio Vizcaino
 
PPTX
ProteomeXchange_and_PRIDE_Semmeting_2015
Juan Antonio Vizcaino
 
PPTX
Proteomics public data resources: enabling "big data" analysis in proteomics
Juan Antonio Vizcaino
 
PPTX
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Juan Antonio Vizcaino
 
PPTX
Reuse of public data in proteomics
Juan Antonio Vizcaino
 
PDF
Proteomics repositories
Juan Antonio Vizcaino
 
PDF
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
Juan Antonio Vizcaino
 
PPTX
PRIDE and ProteomeXchange
Juan Antonio Vizcaino
 
PDF
The ELIXIR Proteomics Community
Juan Antonio Vizcaino
 
PPTX
Proteomics repositories
Juan Antonio Vizcaino
 
ELIXIR Pilot Actions launched in 2014: Integration of BILS-ProteomeXchange us...
Juan Antonio Vizcaino
 
PRIDE-ProteomeXchange
Juan Antonio Vizcaino
 
Is it feasible to identify novel biomarkers by mining public proteomics data?
Juan Antonio Vizcaino
 
Introduction to EBI for Proteomics in ELIXIR
Juan Antonio Vizcaino
 
Proteomics repositories
Juan Antonio Vizcaino
 
An overview of the PRIDE ecosystem of resources and computational tools for m...
Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange: Training webinar
Juan Antonio Vizcaino
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Juan Antonio Vizcaino
 
Reuse of public proteomics data
Juan Antonio Vizcaino
 
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
Juan Antonio Vizcaino
 
Pride and ProteomeXchange
Juan Antonio Vizcaino
 
ProteomeXchange_and_PRIDE_Semmeting_2015
Juan Antonio Vizcaino
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Juan Antonio Vizcaino
 
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Juan Antonio Vizcaino
 
Reuse of public data in proteomics
Juan Antonio Vizcaino
 
Proteomics repositories
Juan Antonio Vizcaino
 
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange
Juan Antonio Vizcaino
 
The ELIXIR Proteomics Community
Juan Antonio Vizcaino
 
Proteomics repositories
Juan Antonio Vizcaino
 
Ad

More from Rafael C. Jimenez (20)

PPTX
BMB Resource Integration Workshop
Rafael C. Jimenez
 
PPTX
Summary of Technical Coordinators discussions
Rafael C. Jimenez
 
PPTX
The European life-science data infrastructure: Data, Computing and Services ...
Rafael C. Jimenez
 
PPT
Standardisation in BMS European infrastructures
Rafael C. Jimenez
 
PPT
Standards
Rafael C. Jimenez
 
PPT
ELIXIR TCG update
Rafael C. Jimenez
 
PPT
An introduction to programmatic access
Rafael C. Jimenez
 
PPTX
Life science requirements from e-infrastructure: initial results from a joint...
Rafael C. Jimenez
 
PPT
Technical activities in ELIXIR Europe
Rafael C. Jimenez
 
PPTX
Challenges of big data. Summary day 1.
Rafael C. Jimenez
 
PPTX
Challenges of big data. Aims of the workshop.
Rafael C. Jimenez
 
PPT
ELIXIR and data grand challenges in life sciences
Rafael C. Jimenez
 
PPT
SASI, A lightweight standard for exchanging course information
Rafael C. Jimenez
 
PPTX
Introduction to the BioJS project
Rafael C. Jimenez
 
BMB Resource Integration Workshop
Rafael C. Jimenez
 
Summary of Technical Coordinators discussions
Rafael C. Jimenez
 
The European life-science data infrastructure: Data, Computing and Services ...
Rafael C. Jimenez
 
Standardisation in BMS European infrastructures
Rafael C. Jimenez
 
ELIXIR TCG update
Rafael C. Jimenez
 
An introduction to programmatic access
Rafael C. Jimenez
 
Life science requirements from e-infrastructure: initial results from a joint...
Rafael C. Jimenez
 
Technical activities in ELIXIR Europe
Rafael C. Jimenez
 
Challenges of big data. Summary day 1.
Rafael C. Jimenez
 
Challenges of big data. Aims of the workshop.
Rafael C. Jimenez
 
ELIXIR and data grand challenges in life sciences
Rafael C. Jimenez
 
SASI, A lightweight standard for exchanging course information
Rafael C. Jimenez
 
Introduction to the BioJS project
Rafael C. Jimenez
 

Recently uploaded (20)

PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Presentation on animal welfare a good topic
kidscream385
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 

Proteomics repositories integration using EUDAT resources

  • 1. European Life Sciences Infrastructure for Biological Information www.elixir-europe.org Rafael C Jimenez ELIXIR CTO EDUAT conference, 25 September 2014 Proteomics repositories integration using EUDAT resources
  • 2. Data submissions 2 Submissions raw data processed data metadata Data repository Search Integration
  • 3. Noble WS, MacCoss MJ (2012) Computational and Statistical Analysis of Protein Mass Spectrometry Data. PLoS Comput Biol 8(1): e1002296. doi:10.1371/journal.pcbi.1002296 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002296 Overview of shotgun proteomics data production MKKKNIYSIRKLGVG IASVTLGTLLISG GVTPAANAAQHD FYQVLNMPNLNADQ RNGFIQSLK DDPSQSANVKLN 4 Peptide sequences Raw data Process data Metadata
  • 4. Data examples 4 Raw data Process data Metadata DNA Human Liver Mitochondria W. Smith … Peptide Mouse Heart Nucleus J. Heinz … LPISASHSSK… TTGTTATCCG… … … …
  • 5. Proteomics data in PRIDE 5 ~85% raw data
  • 6. ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ NeXtProt Peptide Atlas Other DBs Receiving repositories PASSEL (SRM data) PRIDE (MS/MS data) GPMDB Researcher’s results Reprocessed results Raw data* Metadata ProteomeXchange Vizcaíno et al., Nature Biotechnology, 2014 • Framework to enable standard data submission and dissemination pipelines between the main existing proteomics resources.
  • 7. 7 Martens et al., Proteomics, 2005 Vizcaíno et al., NAR, 2013 PRIDE (PRoteomics IDEntifications) database Mass spectrometry
  • 8. Origin: 152 USA 108 Germany 67 United Kingdom 53 Switzerland 48 Netherlands 42 China 42 Canada 41 France 36 Spain 33 Belgium 25 Australia 23 Sweden 17 Japan 16 Denmark 13 Norway 12 Finland 12 India 12 Taiwan 10 Italy 9 Republic of Korea 8 Austria 8 Ireland 8 Brazil 7 Singapore 5 Israel 5 Russia … Type: 273 PRIDE complete 501 PRIDE partial 47 PeptideAtlas/PASSEL complete Access: 38.3% PRIDE public 5.3% PASSEL public 56% PRIDE private 0.4% PASSEL private Data volume: Total: >40 TB Number of all files: >120,000 PXD000320-324: ~ 5 TB PXD000065: ~ 1.4TB Top Species studied by at least 8 datasets: 381 Homo sapiens 100 Mus musculus 31 Arabidopsis thaliana 26 Saccharomyces cerevisiae 16 Escherichia coli 14 Rattus norvegicus 12 Mycobacterium tuberculosis 11 Drosophila melanogaster ~ 215 species in total Submissions/year: 2012: 102 2013: 527 2014: 192
  • 9. Pilot evolution • Use EUDAT • Replication of ELIXIR data in EUDAT data centers • Delegation of ELIXIR data in EUDAT data centers • Adopt EUDAT • Replication of ELIXIR data in ELIXIR data centers using EUDAT technology 9
  • 10. Replication of ELIXIR data in EUDAT data centers 10 Central repository Data storage centers Meta data Raw Data Meta data Results Raw Data
  • 11. ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ NeXtProt Peptide Atlas Other DBs Receiving repositories GPMDB Researcher’s results Reprocessed results Raw data* Metadata Vizcaíno et al., Nature Biotechnology, 2014 Raw Data* PASSEL (SRM data) PRIDE (MS/MS data) Replication of ELIXIR data in EUDAT data centers
  • 12. Delegation of ELIXIR data in EUDAT data centers 12 Central repository Data storage centers Meta data Raw Data Meta data Results Raw Data
  • 13. ProteomeCentral Metadata / Manuscript Raw Data* Results Journals UniProt/ NeXtProt Peptide Atlas Other DBs Receiving repositories GPMDB Researcher’s results Reprocessed results Raw data* Metadata Vizcaíno et al., Nature Biotechnology, 2014 Raw Data* PRIDE (MS/MS data) PASSEL (SRM data) Delegation of ELIXIR data in EUDAT data centers
  • 14. Replication of ELIXIR data in ELIXIR data centers using EUDAT technology 14 National proteomics centers Meta data Results Raw Data Central repository Meta data Results Raw Data
  • 15. Plans 15 National proteomics centers Meta data Results Raw Data Central repository Meta data Results Raw Data Data storage centers Meta data Raw Data 1.- ELXIR replication 2.- EUDAT replication
  • 16. Plans 16 National proteomics centers Meta data Results Raw Data Central repository Meta data Results Raw Data Data storage centers Meta data Raw Data 3.- delegation
  • 19. File sharing model 19 CSC BILS Site B Site C EUDAT CDIELIXIR B2SAFE B2SAFE B2SAFE B2SAFE PRIDE EMBL-EBI
  • 20. Pilot – EUDAT adoption: ELIXIR replication 20 CSC BILS Site B Site C EUDAT CDIELIXIR B2SAFE B2SAFE B2SAFE B2SAFE PRIDE EMBL-EBI Central repositoryNational proteomics centers Meta data Results Raw Data Meta data Results Raw Data
  • 21. PIDs 21 ELIXIR community center ELIXIR Data center 1 EUDAT Data center 1 CSCPRIDEBILS
  • 22. Status • BILS • Migrating from existing Swestore dCache to iRODS • Testing compatibility with B2SAFE • Latest iRDOS not compatible with B2SAFE? • PRIDE • iRODS service installed • B2SAFE module have been deployed at EMBL-EBI (PRIDE) • Test B2SAFE replication PRIDE -> CSC • DOI for datasets • PID for dataset files • Web service to associate datasets to dataset files 22
  • 23. Status In progress • Handle System Registration • Test requests of EPIC/EUDAT identifiers Open questions • BILS local PIDs? • Sync back from PRIDE to BILS for modifications/additions at PRIDE? • Data push or pull model? • Replication of process data requires previous validation 23
  • 24. Participants EUDAT/CSC • Jani Heikkinen • Damien Lecarpentier • Johannes Reetz EMBL-EBI/systems • Andy Jenkinson • Steven Newhouse 24 BILS • Mikael Borg • Fredrik Levander • Bengt Persson EMBL-EBI/PRIDE • JuanAntonioVizcaíno • RuiWang • Henning Hermjakob ELIXIR Hub • Rafael C Jimenez
  • 25. European LifeSciences Infrastructure for Biological Information www.elixir-europe.org Thank you for your attention
  • 26. Delegation of raw data 26 processed data metadata Data repository PID Submissions Search Integration
  • 27. 27 National proteomics centers Meta data Results Raw Data Central repository Meta data Results Raw Data Data storage centers Meta data Raw Data National proteomics centers Meta data Results Raw Data Central repository Meta data Results Raw Data Data storage centers Meta data Raw Data

Editor's Notes

  • #8: Proteomics is the large-scale study of proteins, particularly their structures and functions Mass spectrometry (MS) is an analytical technique that measures the mass-to-charge (m/z) ratio of charged particles.