pharmas and academia join forces to make data FAIR
12th Global Summit on Regulatory Science (GSRS22), Bioinformatics Session, Singapore, 20 Oct, 2022
Slides: https://www.slideshare.net/SusannaSansone
Professor of Data Readiness
Associate Director, Oxford e-Research Centre
Interoperability Platform
Co-Lead
elixir-europe.org
Founding
Academic Editor
nature.com/sdata
datareadiness.eng.ox.uk
Susanna-Assunta Sansone
ORCiD: 0000-0001-5306-5690
Twitter: @SusannaASansone
Open, FAIR and reproducible science
The FAIR Principles
Globally unique and
persistent identifiers
Community defined
descriptive metadata
Community defined
terminologies
Detailed
provenance
Terms of access
Terms of
use
Rationale behind the FAIR Principles
Globally unique and
persistent identifiers
Community defined
descriptive metadata
Community defined
terminologies
Detailed
provenance
Terms of access
Terms of
use
https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-
says/#276a35e6f637
Discoveries are made using shared data, and this
requires data that are:
● Cited and stored to be discoverable
● Retrievable and structured in standard format(s)
● Richly described to be understandable
Data preparation accounts for
about 80% of the work of data
scientists
doi.org/10.2777/986252
www.gov.uk/government/publications/open-
research-data-task-force-final-report
www.fair-access.net.au
doi.org/10.1787/25186167 ark:/48223/pf0000374837
FAIR has aligned the broad community
around common guidelines
doi.org/10.7486/DRI.tq582c863
https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html
FAIR as driven of the digital transformation
in (bio)phamas
● To improve biopharma R&D productivity
● To enables powerful new AI analytics to
access data for ML and prediction
● Requirements
o financial, technical, training
● Challenges
o change the culture, show business
value, achieve the ‘FAIR enough’ on
an enterprise scale
The FAIR Principles:
a continuum of features, attributes, behaviours
The FAIR Principles:
a continuum of features, attributes, behaviours
In practice, what I need to do?
The FAIR Cookbook:
motivations and ambitions
beyond the hype
Large body of generic FAIR
guidance
Motivations
Non-specific guidance for
the life sciences
Ambitions
Target specific situations to deliver a guide with
applied examples
Join academia and industry forces to make the
case for FAIR data management
Build capacity for high quality data
management in the private and public sectors
Lack of practical examples
of ‘how-to’ with different
data types and scenarios
FAIR
Cookbook
https://faircookbook.elixir-europe.org
A resource open to all!
Overview
User validation
and sustainability
Content
Contributors’
perspectives
Outline
What it is?
A collection of recipes that cover the
operation steps of FAIR data management
Who is it for?
Data Managers,
Data Stewards,
Data Curators
Software
Developers,
Terminology
Managers
• A venue to document and share existing and new approaches or
services to support FAIRification
• A way to promote a participatory culture that enables sharing of
expertise by getting exposure and credit
Policymakers,
Funders,
Trainers
• Practical examples to
recommend in policies
• To use in educational
material to incentivize and
guide FAIR in practice.
• Introductory material
• Hands-on, technical
step-by-step examples
Researchers,
Data Scientists,
Principal
Investigators
● Over 70 recipes released and more
content available
● Covering over 20 data types, incl:
○ omics
○ pre-clinical
○ clinical areas
But not limited to it!
Coverage and learning objectives
Learn how to improve the FAIRness with exemplar datasets
Understand the levels and indicators of FAIRness
Discover open source technologies, tools and services
Find out the required skills
Acknowledge the challenges
Over 70 recipes and growing
Define what your needs are
Goal: improving visibility of content
Goal: semantic integration of datasets from multiple sources
Goal: security compliance and with regulators
Define what your needs are
Goal: improving visibility of content, e.g.:
Goal: semantic integration of datasets from multiple sources, e.g.:
Goal: security compliance and with regulators, e.g.:
https://w3id.org/faircookbook/FCB010
https://w3id.org/faircookbook/FCB007
https://w3id.org/faircookbook/FCB006
https://w3id.org/faircookbook/FCB020 https://w3id.org/faircookbook/FCB004
https://w3id.org/faircookbook/FCB014 https://w3id.org/faircookbook/FCB035
https://w3id.org/faircookbook/FCB079
A recipe and a template for the
FAIRification process
Anatomy of a recipe
components
Ingredients
An idea of tools/skills needed
Step by step process
Guidelines, process, description
Practical
elements, code
snippets
#Python3
#zooma-annotator-script.py
file
def
get_annotations(propertyType
, propertyValues, filters = ""): "
Examples
Conclusions
What should I read next?
Links complementary resources
Current links with and references to:
CC BY-SA 4.0 International
MIAME
MIRIAM
MIQAS
MIX
MIGEN
ARRIVE
…
MIAPE
MIASE
…
MISFISHIE
….
REMARK
CONSORT
SRAxml
SDTM FASTA
DICOM
OMOP
…
SBRML
SEDML
…
CDASH
ISA CML
MITAB
…
AAO
CHEBI
OBI
PATO ENVO
MOD
BTO
IDO
…
TEDDY
PRO
…
XAO
DO
…
VO EC number
URL PURL
LSID
Handle
ORCID
RRID
…
InChI
…
IVOA ID
…
DOI
Standard organizations, e.g.: Grass-roots groups, e.g.:
Identifiers
Terminologies Guidelines
Formats
550
303
166
11
More than 1000 standards
Standards in the life science
Tagging recipes with
‘Dataset Maturity Indicators’
Maturity level and indicators
new feature!
https://fairplus.github.io/Data-Maturity
Provide insights into FAIR Maturity reached by
applying a specific recipe to improve a
dataset
Who developed it?
Almost 100 life sciences professionals, researchers and data managers
FARIplus
partners
Industry
+
Academia
ELIXIR
Nodes
represented
Current operations and Editorial Board
Content prioritisation
Identification of topics
Review of drafts
Call for contributions
Monthly book-dash
events
Pre-defined focus areas
Breakout on topics
Housekeeping
Technical platform
Website Martin Cook
Dominique Batista
Office of Data
Science Strategy
Become part of a community of FAIR experts
and write recipes!
1Identify a chapter and a topic
Findability Accessibility Interoperability Reusability
Infrastructure Applied examples Assessment
2 Choose a way of contributing and see our guidelines
Google Docs
HackMD
Git
Markdown cheat sheet
Get recipe template
Tips and tricks
Submit an
outline
3
You can
discuss it
with the
Editorial
Board
Credit and citability
because all contributions matter!
CreDiT
attribution ontology
w3id.org/faircookbook/FCB006
What has motivated so many people
to contribute?
● To stay engaged in a growing community and updated with the latest
development
● To proof their FAIR competence (as individual and as an organisation)
● To expand their network of potential collaborators, clients and users
● To address common challenges and find common solutions at pre-
competitive level
● As an educational material on FAIR in a training context (as part of the
FAIRplus Fellowship Programme):
o showed the validity of the recipes’ content towards the intended
(learning) objectives
o confirmed that some recipes require a greater amount of technical
background knowledge, and a steeper learning curve
Utility and value based on three uses:
what we have learned
● As a practical guidance to improve day-to-day tasks for FAIRer data
● And as contributor towards changing the culture in research data
management (behind the pharmas’ firewall)
o outcomes were expressed in terms of satisfaction of the value of the
recipes, against specific tasks, or challenges addressed
o they reported a positive contribution towards their discussion on return
on investment to operationalize FAIR
Utility and value based on three uses:
what we have learned
https://doi.org/10.5281/zenodo.7156792
FAIR
Cookbook Internationally sustained and adopted
Thanks to
Editorial Board
Section Editors
FAIRplus partners
All bookdashes’ participants
All authors
fairplus-cookbook@elixir-europe.org
faircookbook.elixir-europe.org
fairplus-project.eu This project has received funding from the Innovative Medicines Initiative Joint Undertaking under grant agreement No 802750. This Joint Undertaking
receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA Companies. This communication reflects the
views of the authors and neither IMI nor the European Union, EFPIA or any Associated Partners are liable for any use that may be made of the
information contained herein.

FAIRcookbook: GSRS22-Singapore

  • 1.
    pharmas and academiajoin forces to make data FAIR 12th Global Summit on Regulatory Science (GSRS22), Bioinformatics Session, Singapore, 20 Oct, 2022 Slides: https://www.slideshare.net/SusannaSansone Professor of Data Readiness Associate Director, Oxford e-Research Centre Interoperability Platform Co-Lead elixir-europe.org Founding Academic Editor nature.com/sdata datareadiness.eng.ox.uk Susanna-Assunta Sansone ORCiD: 0000-0001-5306-5690 Twitter: @SusannaASansone Open, FAIR and reproducible science
  • 2.
    The FAIR Principles Globallyunique and persistent identifiers Community defined descriptive metadata Community defined terminologies Detailed provenance Terms of access Terms of use
  • 3.
    Rationale behind theFAIR Principles Globally unique and persistent identifiers Community defined descriptive metadata Community defined terminologies Detailed provenance Terms of access Terms of use https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey- says/#276a35e6f637 Discoveries are made using shared data, and this requires data that are: ● Cited and stored to be discoverable ● Retrievable and structured in standard format(s) ● Richly described to be understandable Data preparation accounts for about 80% of the work of data scientists
  • 4.
    doi.org/10.2777/986252 www.gov.uk/government/publications/open- research-data-task-force-final-report www.fair-access.net.au doi.org/10.1787/25186167 ark:/48223/pf0000374837 FAIR hasaligned the broad community around common guidelines doi.org/10.7486/DRI.tq582c863 https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html
  • 5.
    FAIR as drivenof the digital transformation in (bio)phamas ● To improve biopharma R&D productivity ● To enables powerful new AI analytics to access data for ML and prediction ● Requirements o financial, technical, training ● Challenges o change the culture, show business value, achieve the ‘FAIR enough’ on an enterprise scale
  • 6.
    The FAIR Principles: acontinuum of features, attributes, behaviours
  • 7.
    The FAIR Principles: acontinuum of features, attributes, behaviours In practice, what I need to do?
  • 8.
    The FAIR Cookbook: motivationsand ambitions beyond the hype Large body of generic FAIR guidance Motivations Non-specific guidance for the life sciences Ambitions Target specific situations to deliver a guide with applied examples Join academia and industry forces to make the case for FAIR data management Build capacity for high quality data management in the private and public sectors Lack of practical examples of ‘how-to’ with different data types and scenarios
  • 9.
  • 10.
  • 11.
    What it is? Acollection of recipes that cover the operation steps of FAIR data management
  • 12.
    Who is itfor? Data Managers, Data Stewards, Data Curators Software Developers, Terminology Managers • A venue to document and share existing and new approaches or services to support FAIRification • A way to promote a participatory culture that enables sharing of expertise by getting exposure and credit Policymakers, Funders, Trainers • Practical examples to recommend in policies • To use in educational material to incentivize and guide FAIR in practice. • Introductory material • Hands-on, technical step-by-step examples Researchers, Data Scientists, Principal Investigators
  • 13.
    ● Over 70recipes released and more content available ● Covering over 20 data types, incl: ○ omics ○ pre-clinical ○ clinical areas But not limited to it! Coverage and learning objectives Learn how to improve the FAIRness with exemplar datasets Understand the levels and indicators of FAIRness Discover open source technologies, tools and services Find out the required skills Acknowledge the challenges
  • 14.
    Over 70 recipesand growing
  • 15.
    Define what yourneeds are Goal: improving visibility of content Goal: semantic integration of datasets from multiple sources Goal: security compliance and with regulators
  • 16.
    Define what yourneeds are Goal: improving visibility of content, e.g.: Goal: semantic integration of datasets from multiple sources, e.g.: Goal: security compliance and with regulators, e.g.: https://w3id.org/faircookbook/FCB010 https://w3id.org/faircookbook/FCB007 https://w3id.org/faircookbook/FCB006 https://w3id.org/faircookbook/FCB020 https://w3id.org/faircookbook/FCB004 https://w3id.org/faircookbook/FCB014 https://w3id.org/faircookbook/FCB035
  • 17.
  • 18.
    Anatomy of arecipe components Ingredients An idea of tools/skills needed Step by step process Guidelines, process, description Practical elements, code snippets #Python3 #zooma-annotator-script.py file def get_annotations(propertyType , propertyValues, filters = ""): " Examples Conclusions What should I read next?
  • 19.
    Links complementary resources Currentlinks with and references to:
  • 20.
    CC BY-SA 4.0International MIAME MIRIAM MIQAS MIX MIGEN ARRIVE … MIAPE MIASE … MISFISHIE …. REMARK CONSORT SRAxml SDTM FASTA DICOM OMOP … SBRML SEDML … CDASH ISA CML MITAB … AAO CHEBI OBI PATO ENVO MOD BTO IDO … TEDDY PRO … XAO DO … VO EC number URL PURL LSID Handle ORCID RRID … InChI … IVOA ID … DOI Standard organizations, e.g.: Grass-roots groups, e.g.: Identifiers Terminologies Guidelines Formats 550 303 166 11 More than 1000 standards Standards in the life science
  • 21.
    Tagging recipes with ‘DatasetMaturity Indicators’ Maturity level and indicators new feature! https://fairplus.github.io/Data-Maturity Provide insights into FAIR Maturity reached by applying a specific recipe to improve a dataset
  • 22.
    Who developed it? Almost100 life sciences professionals, researchers and data managers FARIplus partners Industry + Academia ELIXIR Nodes represented
  • 23.
    Current operations andEditorial Board Content prioritisation Identification of topics Review of drafts Call for contributions Monthly book-dash events Pre-defined focus areas Breakout on topics Housekeeping Technical platform Website Martin Cook Dominique Batista Office of Data Science Strategy
  • 24.
    Become part ofa community of FAIR experts and write recipes! 1Identify a chapter and a topic Findability Accessibility Interoperability Reusability Infrastructure Applied examples Assessment 2 Choose a way of contributing and see our guidelines Google Docs HackMD Git Markdown cheat sheet Get recipe template Tips and tricks Submit an outline 3 You can discuss it with the Editorial Board
  • 25.
    Credit and citability becauseall contributions matter! CreDiT attribution ontology w3id.org/faircookbook/FCB006
  • 26.
    What has motivatedso many people to contribute? ● To stay engaged in a growing community and updated with the latest development ● To proof their FAIR competence (as individual and as an organisation) ● To expand their network of potential collaborators, clients and users ● To address common challenges and find common solutions at pre- competitive level
  • 27.
    ● As aneducational material on FAIR in a training context (as part of the FAIRplus Fellowship Programme): o showed the validity of the recipes’ content towards the intended (learning) objectives o confirmed that some recipes require a greater amount of technical background knowledge, and a steeper learning curve Utility and value based on three uses: what we have learned
  • 28.
    ● As apractical guidance to improve day-to-day tasks for FAIRer data ● And as contributor towards changing the culture in research data management (behind the pharmas’ firewall) o outcomes were expressed in terms of satisfaction of the value of the recipes, against specific tasks, or challenges addressed o they reported a positive contribution towards their discussion on return on investment to operationalize FAIR Utility and value based on three uses: what we have learned
  • 29.
  • 30.
    Thanks to Editorial Board SectionEditors FAIRplus partners All bookdashes’ participants All authors [email protected] faircookbook.elixir-europe.org fairplus-project.eu This project has received funding from the Innovative Medicines Initiative Joint Undertaking under grant agreement No 802750. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA Companies. This communication reflects the views of the authors and neither IMI nor the European Union, EFPIA or any Associated Partners are liable for any use that may be made of the information contained herein.