How FAIRsharing can help FAIRify
your standards, databases and
data policies
Peter McQuilton, PhD
ORCiD: 0000-0003-2687-1982 | Twitter: @Drosophilic
ENVRI-FAIR training webinar, Friday 13th December, 2019
Slides:
https://datareadiness.eng.ox.ac.uk
• Introduction to FAIR
• FAIRsharing – helping to FAIRify standards,
databases and data policies
• Connections – building a FAIR ecosystem
Outline
What is it and what
can it do for you?
• Findable
• Discoverable on the web
• Uses globally unique, resolvable and persistent identifiers (e.g. DOI)
• Accessible
• Clearly defined access and security protocols (e.g. for sensitive data, like patient
samples)
• Interoperable
• Machine-actionable
• Community-adopted standards (e.g. formats, guidelines)
• Linked with other resources, shares data
• Reusable
• Clear licensing, data provenance, uses community standards and stored
appropriately
Lots of data aren’t FAIR
10.1038/ng.295
Lots of data aren’t FAIR
10.1038/ng.295
• Not always well cited, stored
• Software, codes, workflows are hard(er) to get hold of
• Poorly described for third party reuse
• Different level of detail and annotation
• Curation, reporting and annotation activities
are perceived as time consuming
• Sometimes rushed and minimally done if professional curation is
not available
Not FAIR: Low ‘findability’
and interoperability
Principles put emphasis on enhancing the ability of machines to automatically find
and use the data, in addition to supporting its reuse by individuals
FAIR principles – built on metadata
10.1038/sdata.2016.18
• License your digital object
• Without a license, an object cannot be reused
• Which license?
• Most permissive you can
• License doesn’t mean open, just provides a
framework for use
• Publish your data – retain ownership but allow
others to reuse, with attribution and credit
Licensing
• A community repository
• Trusted and vetted by the community
• Funded/sustainable
• Has a clear data management and sustainability
plan
• Uses community-adopted standards
• With the appropriate license
• Use standards and repositories that have been
endorsed by funders, journal publishers, other
organisations (e.g. ELIXIR)
Put your data somewhere FAIR
Blomberg N and ELIXIR Consortium.
ELIXIR position paper on FAIR data management in the life sciences
[version 1; not peer reviewed].
F1000Research 2017, 6(ELIXIR):1857 (document)
10.7490/f1000research.1114985.1
• Findable - use PID schemas, use schema.org
mark-up, add metadata to FAIRsharing
• Accessible - Define level of openness –
access protocol and license clearly in a policy
findable from the homepage
• Interoperable – Use community standards for
reporting, models, formats and terminologies
• Reusable - Licensing, provenance of data,
follow reporting standards – clear policy linked
from homepage
Ways to help make your repository,
standard or data policy FAIR
FAIRify your standards,
databases and data policies
10.1038/s41587-019-0080-8@FAIRsharing_org
COMMUNITY STANDARDS
REPOSITORIES,
databases and
knowledgebases
DATA POLICIES
by funders, journals
and other organizations
Curated inter-linked
descriptions
informative and educational resource
We guide consumers to discover, select and use these
resources with confidence
We help producers to make their resources more visible,
more widely adopted and cited
Providing rich descriptive metadata for
resources
Providing rich descriptive metadata for
resources
Providing rich descriptive metadata for
resources
Highlighting relationships with standards,
databases and data policies
Highlighting relationships with standards,
databases and data policies
“The interactive browser will allow us to discover which databases and standards
are not currently included in our author guidelines, enabling us to regularly
monitor and refine our policies as appropriate, in support of our mission to help
our authors enhance the reproducibility of their work.”
H. Murray. Publishing Editor, F1000Research
Mapping the landscape of badges and
certification
Mapping the landscape – collections of resources
• Redesign the data model
• Split databases into repositories and knowledgebases
• Adding more fields to each record
• Adding more network graph tools
• Adding better search and manipulation tools
Your ideas are welcome!
FAIRsharing redesign –
what’s coming next?
Ensures that standards, databases, repositories, policies are:
• Findable, e.g., by providing DOIs and marking up records in schema.org,
allowing users to register, claim, maintain, interlink, classify, search and
discover them
• Accessible, e.g., identifying their level of openness and/or license type
• Interoperable, e.g., highlighting which repositories implement the same
standards to structure and exchange data
• Reusable, e.g., knowing the coverage of a standard and its level of
endorsement by a number of repositories should encourage its use or
extension in neighboring domains, rather than reinvention
FAIRsharing enables the FAIR principles
10.1038/s41587-019-0080-8
Researchers in
academia, industry,
government
Developers and
curators of
resources
Journal publishers or
organizations with
data policy
Research data
facilitators, librarians,
trainers
Learned societies,
unions and
associations
Funders and data
policy makers
A flagship output of the:
Recommended by
funders, e.g.:
Connections– building a
FAIR ecosystem
FAIRsharing
WG
registry recommendations
10.15497/RDA00030
https://doi.org/10.1038/s41587-019-0080-8
Open Access CC-BY
69 authors (adopters, collaborators, users)
representing different stakeholder groups
Analysed the data policies by
journals/publishers, and the standards and
repositories they recommend
Working with journal editors and publishers
What have we learned and what are we doing
now?
Discrepancy in recommendations across the data policies
• some repositories are named, but very few standards are
• cautious approach due to the wealth of existing resources
Recommendations are often driven by
• the editor’s familiarity with one or more standards, notably
for journals or publishers focusing on specific disciplines
• the engagement with learned societies and researchers
actively supporting and using certain resources
⮚ Consensus: FAIRsharing plays a key role in helping editors
to discover and recommend appropriate resources, but
repositories and standards could be more FAIR!
We propose a set of criteria that journals
and publishers believe are important for the
identification and selection of data
repositories, which can be recommended to
researchers when they are preparing to
publish the data underlying their findings
Data Repository Selection: Criteria That Matter
Data Repository Selection: Criteria That Matter
Pre-print:
https://osf.io/m2bce
Started Jan 2018
Objectives
1. Guide journals and publishers in providing authors
with consistent recommendations on data deposition
2. Reduce potential for confusion of researchers and
support staff
3. Inform data repository developers and managers of
the features believed to be important by journals and
publishers
4. Apprise certification and other evaluation initiatives,
serving as a reference and perspective from journals
and publishers
5. Drive the curation in FAIRsharing, which will enable
display, filter and search based on these criteria
Data Repository Selection: Criteria That Matter
Pre-print:
https://osf.io/m2bce
Started Jan 2018
Foreseen impact and next steps
Our work will also drive changes by:
• Defining a common language across publishers;
• Helping publishers to maintain this information in a more
automated way;
• Making the process for selection of recommended
repositories more transparent to all stakeholders
The criteria are available and we are ready for your
feedback – https://tinyurl.com/RepoCriteriaFeedback
Once agreed, we will add the criteria into FAIRsharing
Data Repository Selection: Criteria That Matter
Pre-print:
https://osf.io/m2bce
Started Jan 2018
10.1162/dint_a_00037
StRePo
Metabolomics
Chemistry
Food-System
FAIR Curriculum
OPEDAS
TRAIN
Personal Health Train
Matrix
http://w3id.org/AmIFAIR
10.1162/dint_a_00038
10.1162/dint_a_00037
10.1038/s41597-019-0184-5
Connections
• GO-FAIR Matrix – mapping the landscape
•
• A terminology for data stewardship and FAIR
curricula
• https://terms4fairskills.github.io/
•
• Discover resources that measure and improve FAIRness
• https://www.fairassist.org
FAIR StRePo Projects
•
• Help us map the IN Matrix
• Register your repositories and standards in
FAIRsharing
• Create a Collection for your IN
•
• We are looking for more terminology annotators
• Contact terms4FAIRskills@codata.org
•
• Tell us what’s missing
• Register your resource/questionnaire
Join us!
The use of community standards for (meta)data and identifiers
are among the FAIRness indicators
FAIRsharing content powers 2 (semi)automatic evaluation tools:
https://doi.org/10.1101/657676
https://doi.org/10.1038/s41597-019-0184-5
Collaboration: and
FAIR evaluation tools
The FAIR Evaluator
Designed as a bottom-up community effort, building on
‘generation 1’ FAIR metrics (human entry) to create ‘generation
2’ FAIR maturity indicators that use FAIRsharing metadata
http://w3id.org/AmIFAIR
10.1038/s41597-019-0184-5
Community driven
● Communities can decide which Maturity Indicators
are relevant to them (working with FAIR data
maturity model)
● These are registered in the Evaluator as a
“Collection”, with some documentation about what
MIs are included, and to what communities the
Collection would be relevant
○ the purpose being re-use of Indicator Collections between
similar communities/agencies
○ Anyone can execute an evaluation on any GUID
• Descriptions of community standards
(metrics, identifiers, terminologies,
ontologies, reporting guidelines, models,
formats, schemas), repositories
(including knowledge bases), and
policies (from funders, journals and other
entities) and the relationships between
these resources
• Manually curated descriptions defined
with and vetted by the resource
maintainers themselves
• Citable/resolvable DOIs for all records
• Open and FAIR data, accessible by API
or via web interface
Landscape of standards and
repositories used by the
GO FAIR Chemistry IN
FAIRsharing Metadata – powering the DSW
questionnaires
Semantically-enabled drop-down
menus and auto-complete functions
(using data from FAIRsharing)
FAIRsharing Metadata – powering the DSW
questionnaires
• Many of us, as well as many stakeholders (incl.
publishers and funders) have been doing and supporting
FAIR things before FAIR was a thing
• We need to reconcile views and needs, not just on paper
• Make this ecosystem participatory; easily said than done
Gaps and hurdles
http://blogs.nature.com/scientificdata/2019/10/22/the-layered-cake/
COMMUNITY STANDARDS
REPOSITORIES,
databases and
knowledgebases
DATA POLICIES
by funders, journals
and other organizations
Curated inter-linked
descriptions
informative and educational resource
We guide consumers to discover, select and use these
resources with confidence
We help producers to make their resources more visible,
more widely adopted and cited
FAIRsharing enables the FAIR principles
• Ensures that standards, repositories and policies are:
• Findable - by providing DOIs and marking up metadata
records with schema.org
• Accessible - by identifying their level of openess, license
type and other information in the metadata
• Interoperable – highlighting which repositories implement the
same standards for structuring and exchanging data
• Reusable – through knowing the level of endorsement of a
repository by publisher data policies and the level of
implementation of a standard by repositories encourages
their use rather than reinvention
10.1038/s41587-019-0080-8
datareadiness.eng.ox.ac.uk
with developers and curators

FAIRsharing - ENVRI-FAIR Webinar

  • 1.
    How FAIRsharing canhelp FAIRify your standards, databases and data policies Peter McQuilton, PhD ORCiD: 0000-0003-2687-1982 | Twitter: @Drosophilic ENVRI-FAIR training webinar, Friday 13th December, 2019 Slides: https://datareadiness.eng.ox.ac.uk
  • 2.
    • Introduction toFAIR • FAIRsharing – helping to FAIRify standards, databases and data policies • Connections – building a FAIR ecosystem Outline
  • 3.
    What is itand what can it do for you?
  • 4.
    • Findable • Discoverableon the web • Uses globally unique, resolvable and persistent identifiers (e.g. DOI) • Accessible • Clearly defined access and security protocols (e.g. for sensitive data, like patient samples) • Interoperable • Machine-actionable • Community-adopted standards (e.g. formats, guidelines) • Linked with other resources, shares data • Reusable • Clear licensing, data provenance, uses community standards and stored appropriately
  • 5.
    Lots of dataaren’t FAIR 10.1038/ng.295
  • 6.
    Lots of dataaren’t FAIR 10.1038/ng.295
  • 7.
    • Not alwayswell cited, stored • Software, codes, workflows are hard(er) to get hold of • Poorly described for third party reuse • Different level of detail and annotation • Curation, reporting and annotation activities are perceived as time consuming • Sometimes rushed and minimally done if professional curation is not available Not FAIR: Low ‘findability’ and interoperability
  • 8.
    Principles put emphasison enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals FAIR principles – built on metadata 10.1038/sdata.2016.18
  • 9.
    • License yourdigital object • Without a license, an object cannot be reused • Which license? • Most permissive you can • License doesn’t mean open, just provides a framework for use • Publish your data – retain ownership but allow others to reuse, with attribution and credit Licensing
  • 10.
    • A communityrepository • Trusted and vetted by the community • Funded/sustainable • Has a clear data management and sustainability plan • Uses community-adopted standards • With the appropriate license • Use standards and repositories that have been endorsed by funders, journal publishers, other organisations (e.g. ELIXIR) Put your data somewhere FAIR
  • 11.
    Blomberg N andELIXIR Consortium. ELIXIR position paper on FAIR data management in the life sciences [version 1; not peer reviewed]. F1000Research 2017, 6(ELIXIR):1857 (document) 10.7490/f1000research.1114985.1
  • 12.
    • Findable -use PID schemas, use schema.org mark-up, add metadata to FAIRsharing • Accessible - Define level of openness – access protocol and license clearly in a policy findable from the homepage • Interoperable – Use community standards for reporting, models, formats and terminologies • Reusable - Licensing, provenance of data, follow reporting standards – clear policy linked from homepage Ways to help make your repository, standard or data policy FAIR
  • 13.
    FAIRify your standards, databasesand data policies 10.1038/s41587-019-0080-8@FAIRsharing_org
  • 15.
    COMMUNITY STANDARDS REPOSITORIES, databases and knowledgebases DATAPOLICIES by funders, journals and other organizations Curated inter-linked descriptions informative and educational resource We guide consumers to discover, select and use these resources with confidence We help producers to make their resources more visible, more widely adopted and cited
  • 16.
    Providing rich descriptivemetadata for resources
  • 17.
    Providing rich descriptivemetadata for resources
  • 18.
    Providing rich descriptivemetadata for resources
  • 19.
    Highlighting relationships withstandards, databases and data policies
  • 20.
    Highlighting relationships withstandards, databases and data policies
  • 22.
    “The interactive browserwill allow us to discover which databases and standards are not currently included in our author guidelines, enabling us to regularly monitor and refine our policies as appropriate, in support of our mission to help our authors enhance the reproducibility of their work.” H. Murray. Publishing Editor, F1000Research
  • 23.
    Mapping the landscapeof badges and certification
  • 24.
    Mapping the landscape– collections of resources
  • 25.
    • Redesign thedata model • Split databases into repositories and knowledgebases • Adding more fields to each record • Adding more network graph tools • Adding better search and manipulation tools Your ideas are welcome! FAIRsharing redesign – what’s coming next?
  • 26.
    Ensures that standards,databases, repositories, policies are: • Findable, e.g., by providing DOIs and marking up records in schema.org, allowing users to register, claim, maintain, interlink, classify, search and discover them • Accessible, e.g., identifying their level of openness and/or license type • Interoperable, e.g., highlighting which repositories implement the same standards to structure and exchange data • Reusable, e.g., knowing the coverage of a standard and its level of endorsement by a number of repositories should encourage its use or extension in neighboring domains, rather than reinvention FAIRsharing enables the FAIR principles 10.1038/s41587-019-0080-8
  • 27.
    Researchers in academia, industry, government Developersand curators of resources Journal publishers or organizations with data policy Research data facilitators, librarians, trainers Learned societies, unions and associations Funders and data policy makers A flagship output of the: Recommended by funders, e.g.:
  • 28.
  • 29.
  • 30.
    https://doi.org/10.1038/s41587-019-0080-8 Open Access CC-BY 69authors (adopters, collaborators, users) representing different stakeholder groups Analysed the data policies by journals/publishers, and the standards and repositories they recommend Working with journal editors and publishers
  • 31.
    What have welearned and what are we doing now? Discrepancy in recommendations across the data policies • some repositories are named, but very few standards are • cautious approach due to the wealth of existing resources Recommendations are often driven by • the editor’s familiarity with one or more standards, notably for journals or publishers focusing on specific disciplines • the engagement with learned societies and researchers actively supporting and using certain resources ⮚ Consensus: FAIRsharing plays a key role in helping editors to discover and recommend appropriate resources, but repositories and standards could be more FAIR!
  • 32.
    We propose aset of criteria that journals and publishers believe are important for the identification and selection of data repositories, which can be recommended to researchers when they are preparing to publish the data underlying their findings Data Repository Selection: Criteria That Matter Data Repository Selection: Criteria That Matter Pre-print: https://osf.io/m2bce Started Jan 2018
  • 33.
    Objectives 1. Guide journalsand publishers in providing authors with consistent recommendations on data deposition 2. Reduce potential for confusion of researchers and support staff 3. Inform data repository developers and managers of the features believed to be important by journals and publishers 4. Apprise certification and other evaluation initiatives, serving as a reference and perspective from journals and publishers 5. Drive the curation in FAIRsharing, which will enable display, filter and search based on these criteria Data Repository Selection: Criteria That Matter Pre-print: https://osf.io/m2bce Started Jan 2018
  • 34.
    Foreseen impact andnext steps Our work will also drive changes by: • Defining a common language across publishers; • Helping publishers to maintain this information in a more automated way; • Making the process for selection of recommended repositories more transparent to all stakeholders The criteria are available and we are ready for your feedback – https://tinyurl.com/RepoCriteriaFeedback Once agreed, we will add the criteria into FAIRsharing Data Repository Selection: Criteria That Matter Pre-print: https://osf.io/m2bce Started Jan 2018
  • 36.
  • 37.
    StRePo Metabolomics Chemistry Food-System FAIR Curriculum OPEDAS TRAIN Personal HealthTrain Matrix http://w3id.org/AmIFAIR 10.1162/dint_a_00038 10.1162/dint_a_00037 10.1038/s41597-019-0184-5 Connections
  • 38.
    • GO-FAIR Matrix– mapping the landscape • • A terminology for data stewardship and FAIR curricula • https://terms4fairskills.github.io/ • • Discover resources that measure and improve FAIRness • https://www.fairassist.org FAIR StRePo Projects
  • 39.
    • • Help usmap the IN Matrix • Register your repositories and standards in FAIRsharing • Create a Collection for your IN • • We are looking for more terminology annotators • Contact [email protected] • • Tell us what’s missing • Register your resource/questionnaire Join us!
  • 40.
    The use ofcommunity standards for (meta)data and identifiers are among the FAIRness indicators FAIRsharing content powers 2 (semi)automatic evaluation tools: https://doi.org/10.1101/657676 https://doi.org/10.1038/s41597-019-0184-5 Collaboration: and FAIR evaluation tools
  • 41.
    The FAIR Evaluator Designedas a bottom-up community effort, building on ‘generation 1’ FAIR metrics (human entry) to create ‘generation 2’ FAIR maturity indicators that use FAIRsharing metadata http://w3id.org/AmIFAIR 10.1038/s41597-019-0184-5
  • 42.
    Community driven ● Communitiescan decide which Maturity Indicators are relevant to them (working with FAIR data maturity model) ● These are registered in the Evaluator as a “Collection”, with some documentation about what MIs are included, and to what communities the Collection would be relevant ○ the purpose being re-use of Indicator Collections between similar communities/agencies ○ Anyone can execute an evaluation on any GUID
  • 43.
    • Descriptions ofcommunity standards (metrics, identifiers, terminologies, ontologies, reporting guidelines, models, formats, schemas), repositories (including knowledge bases), and policies (from funders, journals and other entities) and the relationships between these resources • Manually curated descriptions defined with and vetted by the resource maintainers themselves • Citable/resolvable DOIs for all records • Open and FAIR data, accessible by API or via web interface Landscape of standards and repositories used by the GO FAIR Chemistry IN FAIRsharing Metadata – powering the DSW questionnaires
  • 44.
    Semantically-enabled drop-down menus andauto-complete functions (using data from FAIRsharing) FAIRsharing Metadata – powering the DSW questionnaires
  • 45.
    • Many ofus, as well as many stakeholders (incl. publishers and funders) have been doing and supporting FAIR things before FAIR was a thing • We need to reconcile views and needs, not just on paper • Make this ecosystem participatory; easily said than done Gaps and hurdles http://blogs.nature.com/scientificdata/2019/10/22/the-layered-cake/
  • 46.
    COMMUNITY STANDARDS REPOSITORIES, databases and knowledgebases DATAPOLICIES by funders, journals and other organizations Curated inter-linked descriptions informative and educational resource We guide consumers to discover, select and use these resources with confidence We help producers to make their resources more visible, more widely adopted and cited
  • 47.
    FAIRsharing enables theFAIR principles • Ensures that standards, repositories and policies are: • Findable - by providing DOIs and marking up metadata records with schema.org • Accessible - by identifying their level of openess, license type and other information in the metadata • Interoperable – highlighting which repositories implement the same standards for structuring and exchanging data • Reusable – through knowing the level of endorsement of a repository by publisher data policies and the level of implementation of a standard by repositories encourages their use rather than reinvention 10.1038/s41587-019-0080-8
  • 48.