FAIRy stories: the FAIR Data
principles in theory and in practice
Carole Goble
The University of Manchester, UK
carole.goble@manchester.ac.uk
The views expressed in this talk are my own
NSF Convergence Accelerator Series Tracks A&B webinar, 19th May 2021
March 18, 2021
http://spatial.ucsb.edu/2021/Natasha-Noy
Why do we need FAIR data in Research?
“there must be loads of legacy data. We’re desperately trying to go
back and look at what we knew from SARS 10 years ago”
https://www.covid19dataportal.org/
https://www.rd-alliance.org/group/rda-covid19-rda-covid19-omics-rda-covid19-epidemiology-rda-covid19-
clinical-rda-covid19-1
https://doi.org/10.15497/rda00052
Why do we need FAIR data in Research?
COVID Data sharing boost – mobilising people, infrastructure & initiatives
Spotlighted technical, territorial & practices
Provider: collection, upload and governance bottlenecks
User: find and access to datasets, licenses, data and metadata quality
Access to data for processing at scale, common standards
Behaviour inertia and relapse
Long term sustainability
“global pandemic is not sufficient to radically modify
scientific practices”*
* Larregue et al https://blogs.lse.ac.uk/impactofsocialsciences/2020/11/30/covid-19-where-is-the-data/
https://www.nature.com/articles/d41586-021-00305-7
https://www.nature.com/articles/s41597-020-0524-5
Why do we need FAIR data in Research?
information flows, secondary use
Figure: KnowledgeTurning, Information Flow Josh Sommer, Chordoma Foundation, 2011
Community domain enclaves
Resource fragmentation
Flow across platforms/ sovereignties
Pan-discipline drivers
Knowledge churn, loss and cost
2016
A set of GUIDING PRINCIPLES to
enhance the value of all digital
resources and their reuse by PEOPLE
and by MACHINES
ALIGNING a COMMUNITY around
common data guidelines
FAIR Research Data
branding a trend
(re)-stimulating a
movement
What ARE the FAIR principles?
Aspirational guardrails
Not a standard, nor metrics
A contract between data
provider and user
In the original paper
https://www.go-fair.org/fair-principles/
Relaunch a dialogue - research and policy communities.
Reboot a journey - wider accessibility and reusability of data.
compare &
combine data
https://doi.org/10.1038/sdata.2016.18
“enhancing the ability of machines to
automatically find and use data or any digital
object, and support its reuse by individuals”
INCF Statement
Persistent identifiers
Globally unique, resolvable for
data and always for metadata
Structured metadata
Community defined descriptive
metadata using common
terminologies and standards
Linked Data
Vocabularies are FAIR, (meta)data
reference (meta)data, provenance
Automation-
readiness
Access protocols
Open, free and universally
implementable comms protocols
Semantic Web ->
Linked Data ->
Knowledge Graphs.
Machine-processable
metadata.
[Icons: FAIRsharing]
Open as possible, Closed as necessary
Clear licences for innovation and reuse
Sensitive data, GDPR, IPR, jumpy Deans.
Crossing sovereignty boundaries
• Data sharing becomes data visiting &
federated analysis
An industry in controlled secure access….
• Data Usage Ontology, Beacon Passports,
Trusted Research Environments etc….
Terms of access and use: FAIR ≠ OPEN
FAIR OPEN
SAFE
Privacy preservation
Regulatory rigour
FAIR Implicit Assumptions & Implications
Data are first class objects
Primarily aimed at data creators
and providers for benefit of
consumers.
Operating in an (Open) Data
Ecosystem.
Adoption at scale in legacy
settings.
Data sharing
The Life Sciences & pan-European scale data infrastructure
The Life Sciences Infrastructure Zoo
Flows around a Federated & Diverse System
1466 data repositories
(100+ in EOSC-Life)
916 data format and metadata
standards*
from compounds to clinical trials
https://fairsharing.org/ accessed May 2021
Common standards & agreements
mappings of PIDs and metadata
moving metadata around
accountability and responsibility
FAIR players simplified
Researchers and
company
scientists who
generate and use
the data
Service providers
who manage data
and infrastructure
Local -> Global level
Public -> Commercial
Authorities who
drive policy, practice
& resources
Funders, Policy makers,
Publishers, Professional
societies, Standards
organisations, Institutions
Global and national initiatives
Dedicated projects
Community Orgs
Funders
Policy
Publishers
FAIR
first
stage
Dedicated Services
Where we are going
Where we are
[Susanna Sansone]
FAIR
first
stage
FAIR first stage :
Policymakers, Data service providers
How to define, measure compliance and certify FAIR data?
What is a dataset?
General repos vs Curated authoritative archives?
Principles for Data Repositories
https://www.rd-alliance.org/trust-principles-rda-community-effort
https://fairassist.org/
https://www.natureindex.com/news-blog/what-scientists-need-to-know-about-fair-data
Open Data Survey, 2019
81% of researcher
respondents
unfamiliar with FAIR
1. A common mechanism for metadata
Respect and work with the huge legacy
resources: repositories, registries, tools …
community standards
Find, register, index, search resources
Move metadata between services
withoutAPIs
Repositories ->Tools, Aggregators (e.g. licenses)
-> Registries (upload, auto-curation)
Registries -> Registries (across disciplines)
Contribute to Knowledge Graphs
a little bit of semantics at scale
semantic underware
invisible to users
visible to developers & services
Picture: Carole Goble, Turing Lecture 2018
Schema.org: Semantic Mark up for the Web
Cartel of commercial search engines
Wide web use, web infrastructure
Web pages and sitemaps
Types (830+) IceCreamShop
Properties (1300+) hasMenu
Not targeted at science - too much / too little
Dataset type – 120 properties
(Google Data Profile requires 2 properties)
No type for Protein, Gene, Taxon
Harnessing Schema.org for Bioscience
Profile
Data model
Marginality information
Controlled vocabularies
Cardinality
Documentation
Examples
New (properties | types)
definition & consensus
deployment and use
tools & support
Opinionated conventions
Profiles & Link to domain ontologies
}Add Bioscience properties & types if necessary
Examples &Usage Guidelines
}
Community
Harnessing Schema.org for Bioscience
ChemicalSubstance
definition & consensus
deployment and use
tools & support
Opinionated conventions
Profiles & Link to domain ontologies
Add Bioscience properties & types if necessary
Examples &Usage Guidelines
Community
Bioschemas metadata stratification
broad & shallow / deepish & narrowish
Generic
Subject
specific
MolecularEntity,
Protein,
Sample,Taxon,
ChemicalSubstance…
DataCatalog
Dataset
dataset 5 minimum, 8
recommended properties
license & provenance
https://bioschemas.org/profiles/
Crosswalks to metadata schemas *
• DCAT, DataCite,CrossRef, OpenAIRE, DDI
• DCT:issued <-> Schema:dataPublished
What is a dataset?
Include community ontologies
• Type: ChemicalSubstance
• Property: biologicalRole
• ExpectedType: ChEBI ontology
* https://zenodo.org/record/4420116#.YKFOpaHTX18
400+
People
22
Types
32
Profiles
65
Sites
60M+
Pages
bioschemas.org/liveDeploys
bioschemas.org/
liveDeploys
20+
Countries
120
Profile deployments
bioschemas.org/
liveDeploys
Bioschemas Village
MolecularEntity ChemicalSubstance
Toxicology
Data Aggregator
[with thanks: EgonWillighagen]
MolecularEntity
Gene
Protein
Taxon
Dataset
Lessons: Putting FAIR into Practice
A little bit of semantics at scale -> build critical mass
Profiles
• Schema.org culture – Catch 22
• Consensus building, retention & Ontology-itis
Provider mark-up
• Developer friendly in house tools & wacky web implementations
• Adoption incentives & costs of adapting database processes
Consumer services
• Adoption incentives – Catch 22 & tipping points
• DataCatalog & Dataset popular -> Google Dataset search
Consumer-provider readiness
• Tools and training community take-up….
2. Packaging Research Objects
Gather together into a “crate” files,
unbounded references, & other
crates.
FAIR content: metadata,
identifiers, provenance, citation
about the content
FAIR crates: metadata, PIDs,
provenance, citation about the
crate.
more FAIR middleware -> towards FAIR Digital Objects*
*FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units:
https://doi.org/10.3390/publications8020021
Why “crate up” objects? FAIR+R
Flows:
Researchers work with multiple and
different objects using multiple
infrastructures over periods of time
exchange between platforms and people
Parts:
Research has associated objects
linked together by context
metadata files with files
datasets, scripts, SOPs, articles …
0
held in different places
made at different times by
different people & processes
publish, report, reuse, cite, reproduce
register, deposit, archive, port
point to big, sensitive & active content
Aggregate files and/or any URI-addressable
content with structured metadata
Web and Linked Data Native
machine and human readable PIDs + JSON-LD +
Schema.org, search engine & developer friendly
Flex for open ended content, respect legacy
typed by a profile + add more schema.org and
domain ontologies
http://www.researchobject.org/ro-crate/
Archive file
format
FAIR Object Middleware
FAIR Middleware
metadata carrying interchange format
Knowledge
Graph of
Research
Objects
It’s FAIR metadata middleware, stupid
• smart use of wheels already invented
• get tools, services on board
• developer friendly, firm best practice
Known and Unknown unknowns
One size does not fit all
• contextual interpretation
• descriptive openedness , multi-interpretation
Analogous to FAIR Software
• RDA/ReSA FAIR4Research SoftwareWG
Lessons: Putting FAIR into Practice
3. Making (legacy) datasets FAIR: FAIRification
[Picture credit: EgonWillighagen]
Credit to: Ian Harrow, FAIR & OM projects
FAIR as enabler for the digital transformation
● Biopharma R&D productivity can be
improved by implementing the FAIR Data
Principles.
● FAIR enables powerful new AI analytics access
to data for machine learning and prediction
● Fairly AI Ready
● Challenges
○ change the culture, show business value,
achieve the ‘FAIR enough’
○ Sustain FAIR solutions and activities
Slide credit: Susanna Sansone
Making (legacy) datasets FAIR: FAIRification
> 100 Public-Private partnerships of
European Commission, universities SMEs
and Big Pharma translational projects
Pharma’s own datasets
*https://www.go-fair.org/how-to-go-fair/fair-data-point/
Data visiting through a
FAIR Data Point*
Linked Data / RDF tech
Dataset transformation
Methodology
Linkset services
RDFWarehouse (Knowledge Graph)
- API not SPARQL
- Sustainability & maintenance
- Linksets PID mapping services
FAIRification of legacy datasets
Practical
advice
Assessment
processes
FAIR levels of
projects / data
Selection of
datasets
Cost/Benefit
analysis
Methodology
Steps for 1 or
more datasets
Cultural change
Legal templates
Squads & BYODs
Maturity models
Interlinking data from different sources
The lessons of good
global and persistent
identifiers.
Mapping identifiers
and services for
mapping ids to ids and
concepts to concepts.
https://fairplus.github.io/the-fair-cookbook/content/recipes/interoperability/identifier-mapping.html
FAIR by Design
At the start of a collection, built in throughout the life cycle
change management, capacity building
FAIRifying Retrospectively
Legacy datasets, build a cohort,
cost benefit and FAIR readiness over a collection of datasets
Reality
FA(I)R
New FAIRVariants
FAIR++
Legal > Organisational >
Semantic >Technical*
Business and change analysis.
Cost Benefit Analysis.
Scientific / BusinessValue
Sustainability
“…make a decision that
these data are valuable
enough to invest in the work
required for FAIRification.”
interoperability
*EOSC Interoperability Framework
What does FAIRifying a dataset mean?
A database?A pdf? Depositing to a public archive?
Identifier and ontology selecting, assigning,
mapping between and to existing vocabs, and knowing
about ontology services.
High-fidelity ETL loss-less moving (meta)data
from one system to another
Lessons: Putting FAIR into Practice
Lessons: Putting FAIR into Practice
FAIR enough.
Repository manager
Admin monitoring
Bioscientist
Scientific analysis
“Fairness does mean everyone
gets the same. Fairness means
everyone gets what they need”
(Rick Riordan).
Maturity and importance spectrum
Its not all worth it.
FAIR gardens + FAIR scrub
How to assess FAIR maturity
levels, not to be certified but
to make decisions.
FAIR ≠ FREE - an expensive, expert team sport
Mostly manual,
mostly specific
“It is a truth
universally
acknowledged
that a
Knowledge
Graph must be
in want of FAIR
data.
And FAIR data
is in want of
Knowledge
Graphs.”
harvesting
added value
DataCite PID Graph
Bottlenecks:
identifiers and ontologies
curating and ingest pipelines of data providers
4. FAIR Data by Design at Source
Data management platform for Project Hubs
organising, cataloguing, sharing and publishing
multiple kinds of research objects in multiple
repositories for multi-partner projects.
Community developed Knowledge Hub
for guides, examples, tools, and pointers.
Assembled and written by Life Science
researchers and data stewards for their peers.
https://rdmkit.elixir-europe.org
https://fair-dom.org
Lessons: Putting FAIR into Practice
Data creators
• Retention not sharing, act local not global
• Advantage*: intimate knowledge, data
flirting, credits & incentives
Process change and values
• Access to infrastructure with seamless
information flows,Values
• Time & resources to embed into practice
FAIR Stewardship skills
• Professionalisation & know-how
*Pasquetto, I. V., Borgman, C. L., & Wofford, M. F. (2019). Uses and Reuses of Scientific Data: The Data Creators’
Advantage. Harvard Data Science Review, 1(2). https://doi.org/10.1162/99608f92.fc14bf2d
Summary: FAIRy stories
Theory -> mobilised some
Practice -> marathon that takes a village
Move the story from data providers to
enabling creators & consumers prepare to
share FAIR -> Research on Research
Authorities Change Mgt
Stewardship
Service Providers
Sustained infrastructure
Acknowledgements
Special thanks to
• Stian Soiland-Reyes (Uni of Manchester/Uni of Amsterdam)
• Nick Juty & Ebtisam Alharbi (University of Manchester)
• Susanna Sansone (University of Oxford)
• Tony Burdett (EMBL-EBI)
• Ibrahim Emam (ImperialCollege)
• EgonWillighagen (Maastricht University)
• Alasdair Gray (Heriot-Watt University)
Manchester, Research Object, RDMkit, FAIRDOM, FAIRplus, Bioschemas colleagues
(about 130 people)
Icons from the noun project
(https://thenounproject.com/)

FAIRy stories: the FAIR Data principles in theory and in practice

  • 1.
    FAIRy stories: theFAIR Data principles in theory and in practice Carole Goble The University of Manchester, UK [email protected] The views expressed in this talk are my own NSF Convergence Accelerator Series Tracks A&B webinar, 19th May 2021
  • 2.
  • 3.
    Why do weneed FAIR data in Research? “there must be loads of legacy data. We’re desperately trying to go back and look at what we knew from SARS 10 years ago” https://www.covid19dataportal.org/ https://www.rd-alliance.org/group/rda-covid19-rda-covid19-omics-rda-covid19-epidemiology-rda-covid19- clinical-rda-covid19-1 https://doi.org/10.15497/rda00052
  • 4.
    Why do weneed FAIR data in Research? COVID Data sharing boost – mobilising people, infrastructure & initiatives Spotlighted technical, territorial & practices Provider: collection, upload and governance bottlenecks User: find and access to datasets, licenses, data and metadata quality Access to data for processing at scale, common standards Behaviour inertia and relapse Long term sustainability “global pandemic is not sufficient to radically modify scientific practices”* * Larregue et al https://blogs.lse.ac.uk/impactofsocialsciences/2020/11/30/covid-19-where-is-the-data/
  • 5.
  • 6.
    Why do weneed FAIR data in Research? information flows, secondary use Figure: KnowledgeTurning, Information Flow Josh Sommer, Chordoma Foundation, 2011 Community domain enclaves Resource fragmentation Flow across platforms/ sovereignties Pan-discipline drivers Knowledge churn, loss and cost
  • 7.
    2016 A set ofGUIDING PRINCIPLES to enhance the value of all digital resources and their reuse by PEOPLE and by MACHINES ALIGNING a COMMUNITY around common data guidelines FAIR Research Data
  • 8.
  • 9.
    What ARE theFAIR principles? Aspirational guardrails Not a standard, nor metrics A contract between data provider and user In the original paper https://www.go-fair.org/fair-principles/ Relaunch a dialogue - research and policy communities. Reboot a journey - wider accessibility and reusability of data.
  • 10.
  • 11.
    “enhancing the abilityof machines to automatically find and use data or any digital object, and support its reuse by individuals” INCF Statement
  • 12.
    Persistent identifiers Globally unique,resolvable for data and always for metadata Structured metadata Community defined descriptive metadata using common terminologies and standards Linked Data Vocabularies are FAIR, (meta)data reference (meta)data, provenance Automation- readiness Access protocols Open, free and universally implementable comms protocols Semantic Web -> Linked Data -> Knowledge Graphs. Machine-processable metadata. [Icons: FAIRsharing]
  • 13.
    Open as possible,Closed as necessary Clear licences for innovation and reuse Sensitive data, GDPR, IPR, jumpy Deans. Crossing sovereignty boundaries • Data sharing becomes data visiting & federated analysis An industry in controlled secure access…. • Data Usage Ontology, Beacon Passports, Trusted Research Environments etc…. Terms of access and use: FAIR ≠ OPEN FAIR OPEN SAFE Privacy preservation Regulatory rigour
  • 14.
    FAIR Implicit Assumptions& Implications Data are first class objects Primarily aimed at data creators and providers for benefit of consumers. Operating in an (Open) Data Ecosystem. Adoption at scale in legacy settings. Data sharing
  • 15.
    The Life Sciences& pan-European scale data infrastructure
  • 16.
    The Life SciencesInfrastructure Zoo Flows around a Federated & Diverse System 1466 data repositories (100+ in EOSC-Life) 916 data format and metadata standards* from compounds to clinical trials https://fairsharing.org/ accessed May 2021 Common standards & agreements mappings of PIDs and metadata moving metadata around accountability and responsibility
  • 17.
    FAIR players simplified Researchersand company scientists who generate and use the data Service providers who manage data and infrastructure Local -> Global level Public -> Commercial Authorities who drive policy, practice & resources Funders, Policy makers, Publishers, Professional societies, Standards organisations, Institutions
  • 18.
    Global and nationalinitiatives Dedicated projects Community Orgs Funders Policy Publishers FAIR first stage Dedicated Services
  • 19.
    Where we aregoing Where we are [Susanna Sansone] FAIR first stage
  • 20.
    FAIR first stage: Policymakers, Data service providers How to define, measure compliance and certify FAIR data? What is a dataset? General repos vs Curated authoritative archives? Principles for Data Repositories https://www.rd-alliance.org/trust-principles-rda-community-effort https://fairassist.org/
  • 21.
  • 23.
    1. A commonmechanism for metadata Respect and work with the huge legacy resources: repositories, registries, tools … community standards Find, register, index, search resources Move metadata between services withoutAPIs Repositories ->Tools, Aggregators (e.g. licenses) -> Registries (upload, auto-curation) Registries -> Registries (across disciplines) Contribute to Knowledge Graphs a little bit of semantics at scale semantic underware invisible to users visible to developers & services
  • 24.
    Picture: Carole Goble,Turing Lecture 2018 Schema.org: Semantic Mark up for the Web Cartel of commercial search engines Wide web use, web infrastructure Web pages and sitemaps Types (830+) IceCreamShop Properties (1300+) hasMenu Not targeted at science - too much / too little Dataset type – 120 properties (Google Data Profile requires 2 properties) No type for Protein, Gene, Taxon
  • 25.
    Harnessing Schema.org forBioscience Profile Data model Marginality information Controlled vocabularies Cardinality Documentation Examples New (properties | types) definition & consensus deployment and use tools & support Opinionated conventions Profiles & Link to domain ontologies }Add Bioscience properties & types if necessary Examples &Usage Guidelines } Community
  • 26.
    Harnessing Schema.org forBioscience ChemicalSubstance definition & consensus deployment and use tools & support Opinionated conventions Profiles & Link to domain ontologies Add Bioscience properties & types if necessary Examples &Usage Guidelines Community
  • 27.
    Bioschemas metadata stratification broad& shallow / deepish & narrowish Generic Subject specific MolecularEntity, Protein, Sample,Taxon, ChemicalSubstance… DataCatalog Dataset dataset 5 minimum, 8 recommended properties license & provenance https://bioschemas.org/profiles/ Crosswalks to metadata schemas * • DCAT, DataCite,CrossRef, OpenAIRE, DDI • DCT:issued <-> Schema:dataPublished What is a dataset? Include community ontologies • Type: ChemicalSubstance • Property: biologicalRole • ExpectedType: ChEBI ontology * https://zenodo.org/record/4420116#.YKFOpaHTX18
  • 28.
  • 29.
    MolecularEntity ChemicalSubstance Toxicology Data Aggregator [withthanks: EgonWillighagen] MolecularEntity Gene Protein Taxon Dataset
  • 30.
    Lessons: Putting FAIRinto Practice A little bit of semantics at scale -> build critical mass Profiles • Schema.org culture – Catch 22 • Consensus building, retention & Ontology-itis Provider mark-up • Developer friendly in house tools & wacky web implementations • Adoption incentives & costs of adapting database processes Consumer services • Adoption incentives – Catch 22 & tipping points • DataCatalog & Dataset popular -> Google Dataset search Consumer-provider readiness • Tools and training community take-up….
  • 31.
    2. Packaging ResearchObjects Gather together into a “crate” files, unbounded references, & other crates. FAIR content: metadata, identifiers, provenance, citation about the content FAIR crates: metadata, PIDs, provenance, citation about the crate. more FAIR middleware -> towards FAIR Digital Objects* *FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units: https://doi.org/10.3390/publications8020021
  • 32.
    Why “crate up”objects? FAIR+R Flows: Researchers work with multiple and different objects using multiple infrastructures over periods of time exchange between platforms and people Parts: Research has associated objects linked together by context metadata files with files datasets, scripts, SOPs, articles … 0 held in different places made at different times by different people & processes publish, report, reuse, cite, reproduce register, deposit, archive, port point to big, sensitive & active content
  • 33.
    Aggregate files and/orany URI-addressable content with structured metadata Web and Linked Data Native machine and human readable PIDs + JSON-LD + Schema.org, search engine & developer friendly Flex for open ended content, respect legacy typed by a profile + add more schema.org and domain ontologies http://www.researchobject.org/ro-crate/ Archive file format FAIR Object Middleware
  • 34.
    FAIR Middleware metadata carryinginterchange format Knowledge Graph of Research Objects
  • 35.
    It’s FAIR metadatamiddleware, stupid • smart use of wheels already invented • get tools, services on board • developer friendly, firm best practice Known and Unknown unknowns One size does not fit all • contextual interpretation • descriptive openedness , multi-interpretation Analogous to FAIR Software • RDA/ReSA FAIR4Research SoftwareWG Lessons: Putting FAIR into Practice
  • 36.
    3. Making (legacy)datasets FAIR: FAIRification [Picture credit: EgonWillighagen]
  • 37.
    Credit to: IanHarrow, FAIR & OM projects FAIR as enabler for the digital transformation ● Biopharma R&D productivity can be improved by implementing the FAIR Data Principles. ● FAIR enables powerful new AI analytics access to data for machine learning and prediction ● Fairly AI Ready ● Challenges ○ change the culture, show business value, achieve the ‘FAIR enough’ ○ Sustain FAIR solutions and activities Slide credit: Susanna Sansone
  • 38.
    Making (legacy) datasetsFAIR: FAIRification > 100 Public-Private partnerships of European Commission, universities SMEs and Big Pharma translational projects Pharma’s own datasets
  • 39.
    *https://www.go-fair.org/how-to-go-fair/fair-data-point/ Data visiting througha FAIR Data Point* Linked Data / RDF tech Dataset transformation Methodology Linkset services RDFWarehouse (Knowledge Graph) - API not SPARQL - Sustainability & maintenance - Linksets PID mapping services
  • 40.
    FAIRification of legacydatasets Practical advice Assessment processes FAIR levels of projects / data Selection of datasets Cost/Benefit analysis Methodology Steps for 1 or more datasets Cultural change Legal templates Squads & BYODs Maturity models
  • 41.
    Interlinking data fromdifferent sources The lessons of good global and persistent identifiers. Mapping identifiers and services for mapping ids to ids and concepts to concepts. https://fairplus.github.io/the-fair-cookbook/content/recipes/interoperability/identifier-mapping.html
  • 42.
    FAIR by Design Atthe start of a collection, built in throughout the life cycle change management, capacity building FAIRifying Retrospectively Legacy datasets, build a cohort, cost benefit and FAIR readiness over a collection of datasets
  • 43.
  • 44.
    FA(I)R New FAIRVariants FAIR++ Legal >Organisational > Semantic >Technical* Business and change analysis. Cost Benefit Analysis. Scientific / BusinessValue Sustainability “…make a decision that these data are valuable enough to invest in the work required for FAIRification.” interoperability *EOSC Interoperability Framework
  • 45.
    What does FAIRifyinga dataset mean? A database?A pdf? Depositing to a public archive? Identifier and ontology selecting, assigning, mapping between and to existing vocabs, and knowing about ontology services. High-fidelity ETL loss-less moving (meta)data from one system to another Lessons: Putting FAIR into Practice
  • 46.
    Lessons: Putting FAIRinto Practice FAIR enough. Repository manager Admin monitoring Bioscientist Scientific analysis “Fairness does mean everyone gets the same. Fairness means everyone gets what they need” (Rick Riordan). Maturity and importance spectrum Its not all worth it. FAIR gardens + FAIR scrub How to assess FAIR maturity levels, not to be certified but to make decisions.
  • 47.
    FAIR ≠ FREE- an expensive, expert team sport Mostly manual, mostly specific
  • 48.
    “It is atruth universally acknowledged that a Knowledge Graph must be in want of FAIR data. And FAIR data is in want of Knowledge Graphs.” harvesting added value DataCite PID Graph Bottlenecks: identifiers and ontologies curating and ingest pipelines of data providers
  • 49.
    4. FAIR Databy Design at Source Data management platform for Project Hubs organising, cataloguing, sharing and publishing multiple kinds of research objects in multiple repositories for multi-partner projects. Community developed Knowledge Hub for guides, examples, tools, and pointers. Assembled and written by Life Science researchers and data stewards for their peers. https://rdmkit.elixir-europe.org https://fair-dom.org
  • 50.
    Lessons: Putting FAIRinto Practice Data creators • Retention not sharing, act local not global • Advantage*: intimate knowledge, data flirting, credits & incentives Process change and values • Access to infrastructure with seamless information flows,Values • Time & resources to embed into practice FAIR Stewardship skills • Professionalisation & know-how *Pasquetto, I. V., Borgman, C. L., & Wofford, M. F. (2019). Uses and Reuses of Scientific Data: The Data Creators’ Advantage. Harvard Data Science Review, 1(2). https://doi.org/10.1162/99608f92.fc14bf2d
  • 51.
    Summary: FAIRy stories Theory-> mobilised some Practice -> marathon that takes a village Move the story from data providers to enabling creators & consumers prepare to share FAIR -> Research on Research Authorities Change Mgt Stewardship Service Providers Sustained infrastructure
  • 52.
    Acknowledgements Special thanks to •Stian Soiland-Reyes (Uni of Manchester/Uni of Amsterdam) • Nick Juty & Ebtisam Alharbi (University of Manchester) • Susanna Sansone (University of Oxford) • Tony Burdett (EMBL-EBI) • Ibrahim Emam (ImperialCollege) • EgonWillighagen (Maastricht University) • Alasdair Gray (Heriot-Watt University) Manchester, Research Object, RDMkit, FAIRDOM, FAIRplus, Bioschemas colleagues (about 130 people) Icons from the noun project (https://thenounproject.com/)