How FAIR friendly is the
FAIRDOMHub?
Exposing metadata to EOSC
Carole Goble
University of Manchester, UK & FAIRDOM e.V.
carole.goble@manchester.ac.uk
EOSCpilot workshop How FAIR friendly is your Data Catalogue?
Open Science Fair, 6-8 Sept 2017, Athens
FAIRDOM http://fair-dom.org
Systems and Synthetic Biology
Project Commons
Community driven RI
Since 2008
FAIRDOMHub
Public Commons,VRE and Catalogue
An installation of
the SEEK Platform
http://fairdomhub.org
80 Projects
30 Installations
Project-Centric Commons
people, expertise
Self managed spaces
Yellow pages
Multi-partner projects
FAIR Content, FAIR Projects
What methods are been used to determine
enzyme activity?
What SOP was used for this
sample?
Where is the validation data for this model?
Is there any group generating kinetic data?
Is this data available?
Track versions of my model
Whats the relationship between the data and
model?
Which data belong to
which publications?
One place Asset Catalogue
federated types, federated stores, packaging
Multi-results & Versions
Data of many types…
Primary, secondary, tertiary…
Methods, Models, Scripts …
Structured organisation
Spans repository silos
Regardless of location
• In house project stores
• Subject specialist public archives
• General archives
• Internal FAIRDOM stores
More than datasets:
Structured Research Objects
16 datafiles (kinetic, flux inhibition, runout)
19 models (kinetics, validation)
13 SOPs
3 studies (model analysis, construction,
validation)
24 assays/analyses (simulations, model
characterisations)
Penkler, G., du Toit, F., Adams, W., Rautenbach, M.,
Palm, D. C., van Niekerk, D. D. and Snoep, J. L. (2015),
Construction and validation of a detailed kinetic model
of glycolysis in Plasmodium falciparum. FEBS J, 282:
1481–1511. doi:10.1111/febs.13237
Investigation
Study Analysis
Data
Model
SOP(Assay)
https://fairdomhub.org/investigations/56
Accessible < Open
sharing sensitivities
Permission controls
Staged sharing
Licenses
Negotiated access
Embargos
Self managed
spaces
Access content and applications
resolution, execution, reproducibility
SBML Model simulation
Model comparison
Model versioning
Reproducing simulations
[Jacky Snoep, Dagmar Waltemath, Martin Peters, Martin Scharm]
Metadata Framework
FAIR Catalogue, FAIR Content
Schema
Dublin core
Datacite,
DCAT, Bioschemas
Catalogue
Level
Investigation
Studies
Assay/Analysis
Entry
level
Entry
level
Persistent Identifiers
Orcid, DOI
Identifiers.org
Native identifier URLs
Community conventions
PIDs for all levels of content
Record level: subject thematic standards
Accessibility
persistent ids, versions, snapshots
Author List: Joe Bloggs; Jane Doe
Title: My Investigation
Date: September 2016
DOI: https://doi.org/10.15490/seek##
https://doi.org/10.15490/seek.1.investigation.56
Active entry evolves
Version
Fenner et al, A Data Citation Roadmap for Scholarly Data Repositories
doi: https://doi.org/10.1101/097196
Catalogue Interoperability
ISA based Research Object Packaging & Exchange
Author List: Joe Bloggs; Jane Doe
Title: My Investigation
Date: September 2016
DOI: https://doi.org/10.15490/seek##
information travels with the data and models
https://doi.org/10.15490/seek.1.investigation.56
Active entry evolves
Version
Research Object packaging
Standards-based metadata framework for manifests and containers
Metadata for bundling resources scattered across repositories
Catalogue Interoperability
Linked Data Inside and Out
Lower friction of
semantic
annotation
Flexibly represent
different types of
data
Extract and
catalogue
metadata
Define relationships, cross-
link, aggregate, query
data
standard
based Excel
templates
Finding and Accessing Modalities
Lucene
Search Query
Linked Data
SPARQL
endpoint
Browse
Navigate
ISA Structure
API
XML
Linked
Data
Content
negotiation
Research
Object
Zipfile bundle + Linked
Data
Researchobject.org
Resolve
Datacite
Identifiers.org
Catalogue Interoperability: Samples
a key cross cutting issue in catalogues
compliant
Interoperability: Catalogues & Repos.
ELIXIR/GO-FAIR FAIR Datapoint
DCAT
description
https://dtl-fair.atlassian.net/wiki/display/FDP/FAIR+Data+Point+Software+Specification
output: standardized metadata as
Linked Data
input: URL
FAIR API
How FAIR? Pretty good.
FAIR Catalogue for EOSC
Discussion Points
• FAIR at different levels
– The Catalogue vsThe Content
• Minimum information: Just enough and no more
• Balance common metadata types with specific
types
– Divide and conquer drill down
– Library view? Project view? Science view?
– Commonality vs Imposing
• Use Commodity infrastructure and protocols
– For harvesting, indexing, validation, search
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata to EOSC

OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata to EOSC

  • 1.
    How FAIR friendlyis the FAIRDOMHub? Exposing metadata to EOSC Carole Goble University of Manchester, UK & FAIRDOM e.V. [email protected] EOSCpilot workshop How FAIR friendly is your Data Catalogue? Open Science Fair, 6-8 Sept 2017, Athens
  • 2.
    FAIRDOM http://fair-dom.org Systems andSynthetic Biology Project Commons Community driven RI Since 2008
  • 3.
    FAIRDOMHub Public Commons,VRE andCatalogue An installation of the SEEK Platform http://fairdomhub.org 80 Projects 30 Installations
  • 4.
    Project-Centric Commons people, expertise Selfmanaged spaces Yellow pages Multi-partner projects
  • 5.
    FAIR Content, FAIRProjects What methods are been used to determine enzyme activity? What SOP was used for this sample? Where is the validation data for this model? Is there any group generating kinetic data? Is this data available? Track versions of my model Whats the relationship between the data and model? Which data belong to which publications?
  • 6.
    One place AssetCatalogue federated types, federated stores, packaging Multi-results & Versions Data of many types… Primary, secondary, tertiary… Methods, Models, Scripts … Structured organisation Spans repository silos Regardless of location • In house project stores • Subject specialist public archives • General archives • Internal FAIRDOM stores
  • 7.
    More than datasets: StructuredResearch Objects 16 datafiles (kinetic, flux inhibition, runout) 19 models (kinetics, validation) 13 SOPs 3 studies (model analysis, construction, validation) 24 assays/analyses (simulations, model characterisations) Penkler, G., du Toit, F., Adams, W., Rautenbach, M., Palm, D. C., van Niekerk, D. D. and Snoep, J. L. (2015), Construction and validation of a detailed kinetic model of glycolysis in Plasmodium falciparum. FEBS J, 282: 1481–1511. doi:10.1111/febs.13237
  • 8.
  • 9.
    Accessible < Open sharingsensitivities Permission controls Staged sharing Licenses Negotiated access Embargos Self managed spaces
  • 10.
    Access content andapplications resolution, execution, reproducibility SBML Model simulation Model comparison Model versioning Reproducing simulations [Jacky Snoep, Dagmar Waltemath, Martin Peters, Martin Scharm]
  • 11.
    Metadata Framework FAIR Catalogue,FAIR Content Schema Dublin core Datacite, DCAT, Bioschemas Catalogue Level Investigation Studies Assay/Analysis Entry level Entry level Persistent Identifiers Orcid, DOI Identifiers.org Native identifier URLs Community conventions PIDs for all levels of content Record level: subject thematic standards
  • 12.
    Accessibility persistent ids, versions,snapshots Author List: Joe Bloggs; Jane Doe Title: My Investigation Date: September 2016 DOI: https://doi.org/10.15490/seek## https://doi.org/10.15490/seek.1.investigation.56 Active entry evolves Version Fenner et al, A Data Citation Roadmap for Scholarly Data Repositories doi: https://doi.org/10.1101/097196
  • 13.
    Catalogue Interoperability ISA basedResearch Object Packaging & Exchange Author List: Joe Bloggs; Jane Doe Title: My Investigation Date: September 2016 DOI: https://doi.org/10.15490/seek## information travels with the data and models https://doi.org/10.15490/seek.1.investigation.56 Active entry evolves Version
  • 14.
    Research Object packaging Standards-basedmetadata framework for manifests and containers Metadata for bundling resources scattered across repositories
  • 15.
    Catalogue Interoperability Linked DataInside and Out Lower friction of semantic annotation Flexibly represent different types of data Extract and catalogue metadata Define relationships, cross- link, aggregate, query data standard based Excel templates
  • 16.
    Finding and AccessingModalities Lucene Search Query Linked Data SPARQL endpoint Browse Navigate ISA Structure API XML Linked Data Content negotiation Research Object Zipfile bundle + Linked Data Researchobject.org Resolve Datacite Identifiers.org
  • 17.
    Catalogue Interoperability: Samples akey cross cutting issue in catalogues compliant
  • 18.
    Interoperability: Catalogues &Repos. ELIXIR/GO-FAIR FAIR Datapoint DCAT description https://dtl-fair.atlassian.net/wiki/display/FDP/FAIR+Data+Point+Software+Specification output: standardized metadata as Linked Data input: URL FAIR API
  • 19.
  • 20.
    FAIR Catalogue forEOSC Discussion Points • FAIR at different levels – The Catalogue vsThe Content • Minimum information: Just enough and no more • Balance common metadata types with specific types – Divide and conquer drill down – Library view? Project view? Science view? – Commonality vs Imposing • Use Commodity infrastructure and protocols – For harvesting, indexing, validation, search

Editor's Notes

  • #2 REUSE flavour FAIRDOM -  FAIR asset management and sharing experiences in Systems Biology Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Yes, data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation. The FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has 8 years of experience of asset sharing and data infrastructure ranging across European programmes (SysMO and EraSysAPP ERANets), national initiatives (de.NBI, German Virtual Liver Network, UK SynBio centres) and PI's labs . It aims to support Systems Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges of and approaches to sharing, credit, citation and asset infrastructures in practice. I'll also highlight recent experiments in affecting sharing using behavioural interventions. http://www.fair-dom.org http://www.fairdomhub.org http://www.seek4science.org
  • #4 30 installations
  • #5 80+ projects
  • #7 Metadata catalogue and a data catalogue
  • #9 Linking, “Packaging” & Citing Codes, Data, Models, SOPs, Samples, Strains, Articles, People, Projects….
  • #12 Including MAMO JERM is the bridge from data to catalogue
  • #13 Including MAMO JERM is the links content to catalogue
  • #16 Decision making depends on evaluation and confidence on models – parameters, assumptions, uncertainty
  • #19 DOI resolution
  • #20 Samples are inputs and outputs
  • #23 We also have subscriptions Open science as a service notification broker