SlideShare a Scribd company logo
Making Data FAIR*
Tom Plasterer, PhD
Director, Bioinformatics, Research Bioinformatics 20 Mar 2019
* Findable, Accessible, Interoperable and Reusable
3
What FAIR: Principles at-a-Glance
Findable:
• F1 (meta)data are assigned a globally
unique and persistent identifier
• F2 data are described with rich metadata
• F3 metadata clearly and explicitly include
the identifier of the data it describes
• F4 (meta)data are registered or indexed in a
searchable resource
The FAIR Guiding Principles for scientific data management and stewardship
Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016)
Accessible:
• A1 (meta)data are retrievable by their identifier
using a standardized communications protocol
• A1.1 the protocol is open, free, and universally
implementable
• A1.2 the protocol allows for an authentication and
authorization procedure, where necessary;
• A2 metadata are accessible, even when the data
are no longer available;
Interoperable:
• I1 (meta)data use a formal, accessible,
shared, and broadly applicable language for
knowledge representation
• I2 (meta)data use vocabularies that follow
FAIR principles
• I3 (meta)data include qualified references to
other (meta)data
Reusable:
• R1 meta(data) are richly described with a plurality
of accurate and relevant attributes
• R1.1 (meta)data are released with a clear and
accessible data usage license
• R1.2 (meta)data are associated with detailed
provenance
• R1.3 (meta)data meet domain-relevant
community standards
4
Collaborative & Competitive Intelligence:
• Who do we want to partner with? Are there complementary assets to our portfolio?
• What space is too crowded and not our area of expertise?
• Greenfield situations?
Mergers, Acquisitions, Partnerships:
• How do we efficiently and deeply absorb data generated elsewhere into our systems? How
do we efficiently share?
• Does this make a smaller biotech/start-up a more viable partner?
Improved Patient Care:
• Can we share data and outcomes more efficiently in complicated trial settings (basket trials,
adaptive trials) to better engage opinion leaders and foster dialog?
• Along with Differential Privacy approaches, can we have the broader research community
help mine our data?
• How do we best reuse Real World Evidence (RWE) data in the clinic and in trial design?
Data (Ir)-reproducibility:
• Can we make preclinical data (more)-reproducible?
• Can we utilize data credentialization? (thanks to Dan Crowther @ Exscientia)
Why FAIR: Biopharma Value Proposition
5
Why FAIR: €26bn Reasons…
6
When FAIR: A Brief History
Moving away from Narrative
• Nanopublications
Incubating Standards in Open PHACTS
• VoID, PROV-O
Lorentz Center Workshop
• FORCE 11 FAIR Guiding Principles
• Participants: IMI members, US researchers,
Content providers, ELIXIR; European Open
Science Cloud, Big Data to Knowledge (BD2K)
Current Status:
• FAIR Data Workshops (EU-ELIXIR nodes)
• Inclusion in Horizon 2020, NIH Advocacy
• IMI2 Data FAIR-ification Call
• Vendors getting up to speed
7
Linked Data Community of Practice
How familiar are you with the
FAIR principles and metrics?
When FAIR: Community Awareness
8
Linked Data Community of Practice
What is the maturity
level of your
organization with
respect to
implementation of
FAIR?
When FAIR: Getting Started
9
How FAIR: Pistoia FAIR Implementation Group
• Business challenge:
- Effective application and analysis of data
assets in life science industry demands that
it is made Findable, Accessible,
Interoperable and Reusable
• Update and plans:
- Workshop at The Hyve, Utrecht NL in June
2018 resulted in a published feature
article:-
- Workshop at EPAM, Boston US in Dec
2018 contributed to the business case
thinking
- Phase 1 for 2019 plans:-
• Develop the business case to define
distinctive role for the project
• Develop the FAIR Toolkit concept
• Select a use case: e.g. clinical science
to engage with CROs at a workshop
- Seeking more funding – join us!
PM: Ian Harrow Collaborators
1.Metric Tools & Best Practice
2.Training resources
3.Culture change process
4.Use case examples
5.Cost benefit examples
• Adapt for Life Science industry
• Leverage existing FAIR resources
FAIR Toolkit
Implementation
for LS Industry
FAIR
10
How FAIR: Pistoia Ontologies Mapping Project
• Business challenge:
– Use of different ontologies within
same data domain hampers
interoperability and application.
Solve by mapping between them.
• Update and plans:
– Phase 3 completed by end of 2018
• Predicted mappings delivered as a
prototype Ontology Mapping Service
for phenotype and disease domain
• Mappings will be available through
public wiki and OxO mapping repository
at EMBL-EBI
• Mapping algorithm, Paxo is available
openly on GitHub
– Phase 4 for 2019 plans:-
• To extend mapping of biological and
chemical ontologies for support of
laboratory analytics
• FAIR implementation is planned
– Seeking more funding – join us!
PartnersPM: Ian Harrow
11
How FAIR:
12
How FAIR: Implementation Networks
13
How FAIR:
Overview:
• ELIXIR - Project Coordinator & Janssen - Project Leader
• 22 participants with 12 academic, 7 EFPIA, 3 SME
• €8.23M budget with €4M H2020 EC funding + €4.23M EFPIA in-kind
• 42 months
Goals:
• Establish a value-based process for prioritization and selection of IMI project databases
• Develop FAIRification toolkit e.g. develop guidelines, tools and metrics - FAIR Cookbook
• Apply this toolkit to FAIRify datasets from selected IMI projects and EFPIA companies
• Deliver training for data handlers (academia, SMEs and pharmaceuticals) to change and
sustain the data management culture
• Foster and innovation ecosystem on FAIR open data to power future reuse, knowledge
generation and societal benefit e.g. FAIR innovation and SME events
Members:
PM: Serena Scollen
14
How FAIR: Concept
15
How FAIR: FAIR Metrics &
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
17
Start FAIR: Find me Datasets about:
Projects
Study
Indication/
Disease
Technology
Targets
Cohort DatesAgent
Therapeutic
Area
Drugs
18
Dataset Catalog is a collection of Dataset Records
• Catalogs are needed to supporting FAIR (Findable) data
• Catalogs can and should support Enterprise MDM strategies
• Consumers can be internal or external
Dataset Catalogs are needed so data consumers can find Datasets
• Dataset records need sufficient metadata to support discoverability
• Dataset terms are NOT the data instance
Dataset Catalogs surface dataset provenance and enable data access
Dataset Catalogs can provide datasets for multiple consumption patters
• Analytics readiness and fit
• ‘Walking’ across information models
Start FAIR: Findability Starts with Catalogs
19
Start FAIR: A DCAT conformant Data Catalog
https://www.w3.org/TR/hcls-dataset/
https://www.w3.org/TR/vocab-dcat/#vocabulary-overview
Semantic tagging of datasets with
concepts from taxonomies:
• provides context
• multi-dimensional & flexible
• effective for discoverability
• light-weight semantics
skos:Concept
dcat:Catalog skos:ConceptScheme
dctypes:Dataset (summary)
dct:title
dct:publisher <foaf:Agent>
foaf:page
void:sparqlEndpoint
dct:accrualPeriodicity
dcat:keyword
dcat:dataset
dcat:theme
dctypes:Dataset (version)
dcat:Distribution
(dctypes:Dataset)
void:vocabulary
dct:conformsTo
void:exampleResource
…other void properties
dcat:distribution
dcat:themeTaxonomy
dct:isVersionOf
pav:previousVersion
dct:hasPart
pav:hasCurrentVersion
dct:hasPart
dct:title
dct:publisher <foaf:Agent>
pav:version
dct:creator <foaf:Agent>
dct:created
dct:source
dct:creator <foaf:Agent>
dct:license
dct:format
pav:retrievedFrom
dct:created
pav:createdWith
dcat:accessURL
dcat:downloadURL
void:Dataset
dct:title
dctDescription
dct:publisher <foaf:Agent>
Start FAIR: Dataset to Knowlege Graph to Analytics
Data Catalog Filter
Phase 1
Experiment Metadata Filter
Phase 2
Ad hoc Analyses Filtering
Phase 3
Outbound
to Data Analytics
Data Science
Tools
Statistical
Filtering
e.g., clinical trial with > 50
participants
Dataset
Catalog
Descriptions
R&D | RDI
Why FAIR?
• Cost avoidance, Business Advantage, Data Stewardship
When FAIR?
• Now! Peers, especially in Europe, are doing it
How FAIR?
• FAIRplus, GO-FAIR, Pistoia FAIR Implementation Group
Start FAIR
• Findability first, adopt a FAIR-compliant Data Catalog
FAIR-for-Biopharma: Take-aways
R&D | RDI
Thanks
Key Influencers
David Wood
Tim Berners-Lee
Lee Harland
Jane Lomax
James Malone
Dean Allemang
Barend Mons
Carole Goble
Bernadette Hyland
Bob Stanley
Eric Little
Michel Dumontier
John Wilbanks
Hans Constandt
Filip Pattyn
Tim Hoctor
Kees Van Boche
Serena Scollen
AstraZeneca/Pistoia FAIR
Data Community
Mathew Woodwark
Rajan Desai
Nic Sinibaldi
Chia-Chien Chiang
Kerstin Forsberg
Ola Engkvist
Ian Dix
Colin Wood
Ted Slater
Martin Romacker
Eric Neumann
John Wise
Carmen Nitsche
Ian Harrow
Jeff Saltzman
Kathy Reinold

More Related Content

PPTX
ODSC May 2019 - The DataOps Manifesto
DataKitchen
 
PPT
International Health Regulation
Sandeep Ghimire
 
PPTX
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Eric Kavanagh
 
PDF
Early Warning and Fire Information System V11
Veerachai Tanpipat
 
PPTX
WASH globally and Nepal_ Prayas Gautam _CMC_MPH
Prayas Gautam
 
PDF
How can we reduce open defecation in rural India?
Yogesh Upadhyaya
 
PPTX
Outbreak investigation.pptx
asifraza4646
 
PPTX
International Health Regulations
TanveerRehman4
 
ODSC May 2019 - The DataOps Manifesto
DataKitchen
 
International Health Regulation
Sandeep Ghimire
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Eric Kavanagh
 
Early Warning and Fire Information System V11
Veerachai Tanpipat
 
WASH globally and Nepal_ Prayas Gautam _CMC_MPH
Prayas Gautam
 
How can we reduce open defecation in rural India?
Yogesh Upadhyaya
 
Outbreak investigation.pptx
asifraza4646
 
International Health Regulations
TanveerRehman4
 

What's hot (20)

PDF
Making the Case for Integration Platform as a Service (iPaaS)
Axway
 
PPTX
Pulse polio program and microplanning
AishwaryaRG2
 
PDF
Phar Data Platform: From the Lakehouse Paradigm to the Reality
Databricks
 
PPTX
Session 6 a DHIS2 : Overview and Implementation in West Africa
COP_HHA
 
PDF
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
PPT
Analysis and interpretation of surveillance data
Abino David
 
PDF
FAIR Ddata in trustworthy repositories: the basics
OpenAIRE
 
PPTX
Jakarta declaration on Health promotion.pptx
babitashrestha16
 
PPTX
Dhis2 integration and user experience
Diwash Timilsina
 
PDF
Dsd04 sta
Văn Đào Tiến
 
PPTX
Brief introduction to the One Health concept, and beyond
ILRI
 
PDF
Azure Data Factory v2
Sergio Zenatti Filho
 
PDF
Integration of WASH and Nutrition: Successes, Challenges, and Implications fo...
Jordan Teague
 
PDF
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY
 
PPTX
Sanitation Context in Nepal.pptx
PrasharamBC
 
PDF
One health
Silvia Pessah
 
PDF
Immunization Coverage Monitoring Tool
ITSU - Immunization Technical Support Unit
 
PPTX
Swoc analysis of health care delivery system
alka mishra
 
PPT
Yhteisöllinen ja tutkiva verkko oppiminen
Matleena Laakso
 
PDF
Data Mesh for Dinner
Kent Graziano
 
Making the Case for Integration Platform as a Service (iPaaS)
Axway
 
Pulse polio program and microplanning
AishwaryaRG2
 
Phar Data Platform: From the Lakehouse Paradigm to the Reality
Databricks
 
Session 6 a DHIS2 : Overview and Implementation in West Africa
COP_HHA
 
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Analysis and interpretation of surveillance data
Abino David
 
FAIR Ddata in trustworthy repositories: the basics
OpenAIRE
 
Jakarta declaration on Health promotion.pptx
babitashrestha16
 
Dhis2 integration and user experience
Diwash Timilsina
 
Brief introduction to the One Health concept, and beyond
ILRI
 
Azure Data Factory v2
Sergio Zenatti Filho
 
Integration of WASH and Nutrition: Successes, Challenges, and Implications fo...
Jordan Teague
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY
 
Sanitation Context in Nepal.pptx
PrasharamBC
 
One health
Silvia Pessah
 
Immunization Coverage Monitoring Tool
ITSU - Immunization Technical Support Unit
 
Swoc analysis of health care delivery system
alka mishra
 
Yhteisöllinen ja tutkiva verkko oppiminen
Matleena Laakso
 
Data Mesh for Dinner
Kent Graziano
 
Ad

Similar to Making Data FAIR (Findable, Accessible, Interoperable, Reusable) (20)

PDF
FAIR, FAIRplus and the FAIR Cookbook
Susanna-Assunta Sansone
 
PDF
The FAIR Principles and the IMI FAIRplus project
Susanna-Assunta Sansone
 
PDF
FAIR, standards and FAIRsharing - MAQC Society 2019
Susanna-Assunta Sansone
 
PDF
The FAIR movement - Oxford Open Data Week
Susanna-Assunta Sansone
 
PPTX
How are we Faring with FAIR? (and what FAIR is not)
Carole Goble
 
PPTX
Let’s go on a FAIR safari!
Carole Goble
 
PDF
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
Susanna-Assunta Sansone
 
PDF
PA webinar on benefits & costs of FAIR implementation in life sciences
Pistoia Alliance
 
PDF
FAIR overview - MAQC Society, Feb 2018
Susanna-Assunta Sansone
 
PPTX
FAIR data
Sarah Jones
 
PPTX
BioPharma and FAIR Data, a Collaborative Advantage
Tom Plasterer
 
PDF
FAIR data and standards for a coordinated COVID-19 response
Susanna-Assunta Sansone
 
PDF
Behind the FAIR brand: Thinkers, Doers and Dreamers
Susanna-Assunta Sansone
 
PDF
FAIR resources, selected examples from ELIXIR-related projects
Susanna-Assunta Sansone
 
PDF
FAIR-4-GSC-Sansone-Aug23.pdf
Susanna-Assunta Sansone
 
PDF
The FAIR Principles and FAIRsharing
Susanna-Assunta Sansone
 
PPTX
FAIR data: what it means, how we achieve it, and the role of RDA
Sarah Jones
 
PPTX
FAIRy stories: tales from building the FAIR Research Commons
Carole Goble
 
PPTX
Turning FAIR into Reality: Briefing on the EC’s report on FAIR data
dri_ireland
 
PPTX
FAIRsharing - ENVRI-FAIR Webinar
Peter McQuilton
 
FAIR, FAIRplus and the FAIR Cookbook
Susanna-Assunta Sansone
 
The FAIR Principles and the IMI FAIRplus project
Susanna-Assunta Sansone
 
FAIR, standards and FAIRsharing - MAQC Society 2019
Susanna-Assunta Sansone
 
The FAIR movement - Oxford Open Data Week
Susanna-Assunta Sansone
 
How are we Faring with FAIR? (and what FAIR is not)
Carole Goble
 
Let’s go on a FAIR safari!
Carole Goble
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
Susanna-Assunta Sansone
 
PA webinar on benefits & costs of FAIR implementation in life sciences
Pistoia Alliance
 
FAIR overview - MAQC Society, Feb 2018
Susanna-Assunta Sansone
 
FAIR data
Sarah Jones
 
BioPharma and FAIR Data, a Collaborative Advantage
Tom Plasterer
 
FAIR data and standards for a coordinated COVID-19 response
Susanna-Assunta Sansone
 
Behind the FAIR brand: Thinkers, Doers and Dreamers
Susanna-Assunta Sansone
 
FAIR resources, selected examples from ELIXIR-related projects
Susanna-Assunta Sansone
 
FAIR-4-GSC-Sansone-Aug23.pdf
Susanna-Assunta Sansone
 
The FAIR Principles and FAIRsharing
Susanna-Assunta Sansone
 
FAIR data: what it means, how we achieve it, and the role of RDA
Sarah Jones
 
FAIRy stories: tales from building the FAIR Research Commons
Carole Goble
 
Turning FAIR into Reality: Briefing on the EC’s report on FAIR data
dri_ireland
 
FAIRsharing - ENVRI-FAIR Webinar
Peter McQuilton
 
Ad

More from Tom Plasterer (9)

PDF
FAIR Data Knowledge Graphs–from Theory to Practice
Tom Plasterer
 
PPTX
FAIR Data Knowledge Graphs
Tom Plasterer
 
PDF
Dataset Catalogs as a Foundation for FAIR* Data
Tom Plasterer
 
PDF
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Tom Plasterer
 
PPTX
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Tom Plasterer
 
PPTX
Linked Data for Biopharma
Tom Plasterer
 
PPT
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Tom Plasterer
 
PPT
Mechanisms of Plaque Rupture in Advanced Atherosclerosis
Tom Plasterer
 
PPT
Biomarker Strategies
Tom Plasterer
 
FAIR Data Knowledge Graphs–from Theory to Practice
Tom Plasterer
 
FAIR Data Knowledge Graphs
Tom Plasterer
 
Dataset Catalogs as a Foundation for FAIR* Data
Tom Plasterer
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Tom Plasterer
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Tom Plasterer
 
Linked Data for Biopharma
Tom Plasterer
 
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Tom Plasterer
 
Mechanisms of Plaque Rupture in Advanced Atherosclerosis
Tom Plasterer
 
Biomarker Strategies
Tom Plasterer
 

Recently uploaded (20)

DOCX
Paediatrics Question Papers – III MBBS (Part II), RUHS Main Exam 2025-2016
Shivankan Kakkar
 
PPTX
BORDER_MOULDING-_Dr._Sonia.assistant professor
drsoniabithi1987
 
PPTX
Anaesthesia Machine - Safety Features and Recent Advances - Dr.Vaidyanathan R
VAIDYANATHAN R
 
PPTX
Models of screening of Adrenergic Blocking Drugs.pptx
Dr Fatima Rani
 
PPTX
HANAU ARTICULATORS AND CLASSIFICATION.pptx
Priya Singaravelu
 
PPTX
perioperative management and ERAS protocol.pptx
Fahad Ahmad
 
PPTX
COPD chronic obstructive pulmonary disease.pptx
pearlprincess7557
 
PDF
CA & Simple Goitre , surgery, Faculty of medicine .pdf
MostafaMohammed95
 
PPTX
A Detailed Overview of Sterols Chemistry, Sources, Functions and Applications...
Indranil Karmakar
 
PPTX
Congenital abrnomalities of Urogenital of System
KesheniLemi
 
PPTX
Chemical Burn, Etiology, Types and Management.pptx
Dr. Junaid Khurshid
 
PPTX
Birth Preparedness & Complication Readiness
Pratiksha Rai
 
PPTX
INFLAMMATION By Soumyadip Datta #physiotherapy
Soumyadip Datta
 
PPTX
5.Gene therapy for musculoskeletal system disorders.pptx
Bolan University of Medical and Health Sciences ,Quetta
 
PPTX
Biochemistry Quiz 2025-Metabologic PowerPoint
Prof Viyatprajna Acharya
 
PPTX
Describe Thyroid storm & it’s Pharmacotherapy Drug Interaction: Pyridoxine + ...
Dr. Deepa Singh Rana
 
PPTX
the comoany external environment crafting
immrahaman62
 
PPTX
The Anatomy of the Major Salivary Glands
Srinjoy Chatterjee
 
PPT
Diagnosis-and-treatment-planning-in-CD - DR.SONIA.ppt
drsoniabithi1987
 
PPTX
Models for Screening of DIURETICS- Dr. ZOYA KHAN.pptx
Zoya Khan
 
Paediatrics Question Papers – III MBBS (Part II), RUHS Main Exam 2025-2016
Shivankan Kakkar
 
BORDER_MOULDING-_Dr._Sonia.assistant professor
drsoniabithi1987
 
Anaesthesia Machine - Safety Features and Recent Advances - Dr.Vaidyanathan R
VAIDYANATHAN R
 
Models of screening of Adrenergic Blocking Drugs.pptx
Dr Fatima Rani
 
HANAU ARTICULATORS AND CLASSIFICATION.pptx
Priya Singaravelu
 
perioperative management and ERAS protocol.pptx
Fahad Ahmad
 
COPD chronic obstructive pulmonary disease.pptx
pearlprincess7557
 
CA & Simple Goitre , surgery, Faculty of medicine .pdf
MostafaMohammed95
 
A Detailed Overview of Sterols Chemistry, Sources, Functions and Applications...
Indranil Karmakar
 
Congenital abrnomalities of Urogenital of System
KesheniLemi
 
Chemical Burn, Etiology, Types and Management.pptx
Dr. Junaid Khurshid
 
Birth Preparedness & Complication Readiness
Pratiksha Rai
 
INFLAMMATION By Soumyadip Datta #physiotherapy
Soumyadip Datta
 
5.Gene therapy for musculoskeletal system disorders.pptx
Bolan University of Medical and Health Sciences ,Quetta
 
Biochemistry Quiz 2025-Metabologic PowerPoint
Prof Viyatprajna Acharya
 
Describe Thyroid storm & it’s Pharmacotherapy Drug Interaction: Pyridoxine + ...
Dr. Deepa Singh Rana
 
the comoany external environment crafting
immrahaman62
 
The Anatomy of the Major Salivary Glands
Srinjoy Chatterjee
 
Diagnosis-and-treatment-planning-in-CD - DR.SONIA.ppt
drsoniabithi1987
 
Models for Screening of DIURETICS- Dr. ZOYA KHAN.pptx
Zoya Khan
 

Making Data FAIR (Findable, Accessible, Interoperable, Reusable)

  • 1. Making Data FAIR* Tom Plasterer, PhD Director, Bioinformatics, Research Bioinformatics 20 Mar 2019 * Findable, Accessible, Interoperable and Reusable
  • 2. 3 What FAIR: Principles at-a-Glance Findable: • F1 (meta)data are assigned a globally unique and persistent identifier • F2 data are described with rich metadata • F3 metadata clearly and explicitly include the identifier of the data it describes • F4 (meta)data are registered or indexed in a searchable resource The FAIR Guiding Principles for scientific data management and stewardship Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016) Accessible: • A1 (meta)data are retrievable by their identifier using a standardized communications protocol • A1.1 the protocol is open, free, and universally implementable • A1.2 the protocol allows for an authentication and authorization procedure, where necessary; • A2 metadata are accessible, even when the data are no longer available; Interoperable: • I1 (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation • I2 (meta)data use vocabularies that follow FAIR principles • I3 (meta)data include qualified references to other (meta)data Reusable: • R1 meta(data) are richly described with a plurality of accurate and relevant attributes • R1.1 (meta)data are released with a clear and accessible data usage license • R1.2 (meta)data are associated with detailed provenance • R1.3 (meta)data meet domain-relevant community standards
  • 3. 4 Collaborative & Competitive Intelligence: • Who do we want to partner with? Are there complementary assets to our portfolio? • What space is too crowded and not our area of expertise? • Greenfield situations? Mergers, Acquisitions, Partnerships: • How do we efficiently and deeply absorb data generated elsewhere into our systems? How do we efficiently share? • Does this make a smaller biotech/start-up a more viable partner? Improved Patient Care: • Can we share data and outcomes more efficiently in complicated trial settings (basket trials, adaptive trials) to better engage opinion leaders and foster dialog? • Along with Differential Privacy approaches, can we have the broader research community help mine our data? • How do we best reuse Real World Evidence (RWE) data in the clinic and in trial design? Data (Ir)-reproducibility: • Can we make preclinical data (more)-reproducible? • Can we utilize data credentialization? (thanks to Dan Crowther @ Exscientia) Why FAIR: Biopharma Value Proposition
  • 4. 5 Why FAIR: €26bn Reasons…
  • 5. 6 When FAIR: A Brief History Moving away from Narrative • Nanopublications Incubating Standards in Open PHACTS • VoID, PROV-O Lorentz Center Workshop • FORCE 11 FAIR Guiding Principles • Participants: IMI members, US researchers, Content providers, ELIXIR; European Open Science Cloud, Big Data to Knowledge (BD2K) Current Status: • FAIR Data Workshops (EU-ELIXIR nodes) • Inclusion in Horizon 2020, NIH Advocacy • IMI2 Data FAIR-ification Call • Vendors getting up to speed
  • 6. 7 Linked Data Community of Practice How familiar are you with the FAIR principles and metrics? When FAIR: Community Awareness
  • 7. 8 Linked Data Community of Practice What is the maturity level of your organization with respect to implementation of FAIR? When FAIR: Getting Started
  • 8. 9 How FAIR: Pistoia FAIR Implementation Group • Business challenge: - Effective application and analysis of data assets in life science industry demands that it is made Findable, Accessible, Interoperable and Reusable • Update and plans: - Workshop at The Hyve, Utrecht NL in June 2018 resulted in a published feature article:- - Workshop at EPAM, Boston US in Dec 2018 contributed to the business case thinking - Phase 1 for 2019 plans:- • Develop the business case to define distinctive role for the project • Develop the FAIR Toolkit concept • Select a use case: e.g. clinical science to engage with CROs at a workshop - Seeking more funding – join us! PM: Ian Harrow Collaborators 1.Metric Tools & Best Practice 2.Training resources 3.Culture change process 4.Use case examples 5.Cost benefit examples • Adapt for Life Science industry • Leverage existing FAIR resources FAIR Toolkit Implementation for LS Industry FAIR
  • 9. 10 How FAIR: Pistoia Ontologies Mapping Project • Business challenge: – Use of different ontologies within same data domain hampers interoperability and application. Solve by mapping between them. • Update and plans: – Phase 3 completed by end of 2018 • Predicted mappings delivered as a prototype Ontology Mapping Service for phenotype and disease domain • Mappings will be available through public wiki and OxO mapping repository at EMBL-EBI • Mapping algorithm, Paxo is available openly on GitHub – Phase 4 for 2019 plans:- • To extend mapping of biological and chemical ontologies for support of laboratory analytics • FAIR implementation is planned – Seeking more funding – join us! PartnersPM: Ian Harrow
  • 12. 13 How FAIR: Overview: • ELIXIR - Project Coordinator & Janssen - Project Leader • 22 participants with 12 academic, 7 EFPIA, 3 SME • €8.23M budget with €4M H2020 EC funding + €4.23M EFPIA in-kind • 42 months Goals: • Establish a value-based process for prioritization and selection of IMI project databases • Develop FAIRification toolkit e.g. develop guidelines, tools and metrics - FAIR Cookbook • Apply this toolkit to FAIRify datasets from selected IMI projects and EFPIA companies • Deliver training for data handlers (academia, SMEs and pharmaceuticals) to change and sustain the data management culture • Foster and innovation ecosystem on FAIR open data to power future reuse, knowledge generation and societal benefit e.g. FAIR innovation and SME events Members: PM: Serena Scollen
  • 14. 15 How FAIR: FAIR Metrics &
  • 16. 17 Start FAIR: Find me Datasets about: Projects Study Indication/ Disease Technology Targets Cohort DatesAgent Therapeutic Area Drugs
  • 17. 18 Dataset Catalog is a collection of Dataset Records • Catalogs are needed to supporting FAIR (Findable) data • Catalogs can and should support Enterprise MDM strategies • Consumers can be internal or external Dataset Catalogs are needed so data consumers can find Datasets • Dataset records need sufficient metadata to support discoverability • Dataset terms are NOT the data instance Dataset Catalogs surface dataset provenance and enable data access Dataset Catalogs can provide datasets for multiple consumption patters • Analytics readiness and fit • ‘Walking’ across information models Start FAIR: Findability Starts with Catalogs
  • 18. 19 Start FAIR: A DCAT conformant Data Catalog https://www.w3.org/TR/hcls-dataset/ https://www.w3.org/TR/vocab-dcat/#vocabulary-overview Semantic tagging of datasets with concepts from taxonomies: • provides context • multi-dimensional & flexible • effective for discoverability • light-weight semantics skos:Concept dcat:Catalog skos:ConceptScheme dctypes:Dataset (summary) dct:title dct:publisher <foaf:Agent> foaf:page void:sparqlEndpoint dct:accrualPeriodicity dcat:keyword dcat:dataset dcat:theme dctypes:Dataset (version) dcat:Distribution (dctypes:Dataset) void:vocabulary dct:conformsTo void:exampleResource …other void properties dcat:distribution dcat:themeTaxonomy dct:isVersionOf pav:previousVersion dct:hasPart pav:hasCurrentVersion dct:hasPart dct:title dct:publisher <foaf:Agent> pav:version dct:creator <foaf:Agent> dct:created dct:source dct:creator <foaf:Agent> dct:license dct:format pav:retrievedFrom dct:created pav:createdWith dcat:accessURL dcat:downloadURL void:Dataset dct:title dctDescription dct:publisher <foaf:Agent>
  • 19. Start FAIR: Dataset to Knowlege Graph to Analytics Data Catalog Filter Phase 1 Experiment Metadata Filter Phase 2 Ad hoc Analyses Filtering Phase 3 Outbound to Data Analytics Data Science Tools Statistical Filtering e.g., clinical trial with > 50 participants Dataset Catalog Descriptions
  • 20. R&D | RDI Why FAIR? • Cost avoidance, Business Advantage, Data Stewardship When FAIR? • Now! Peers, especially in Europe, are doing it How FAIR? • FAIRplus, GO-FAIR, Pistoia FAIR Implementation Group Start FAIR • Findability first, adopt a FAIR-compliant Data Catalog FAIR-for-Biopharma: Take-aways
  • 21. R&D | RDI Thanks Key Influencers David Wood Tim Berners-Lee Lee Harland Jane Lomax James Malone Dean Allemang Barend Mons Carole Goble Bernadette Hyland Bob Stanley Eric Little Michel Dumontier John Wilbanks Hans Constandt Filip Pattyn Tim Hoctor Kees Van Boche Serena Scollen AstraZeneca/Pistoia FAIR Data Community Mathew Woodwark Rajan Desai Nic Sinibaldi Chia-Chien Chiang Kerstin Forsberg Ola Engkvist Ian Dix Colin Wood Ted Slater Martin Romacker Eric Neumann John Wise Carmen Nitsche Ian Harrow Jeff Saltzman Kathy Reinold

Editor's Notes

  • #4: Eric Schulte’s talk: Ready, Set, GO-FAIR: https://vimeo.com/282650465
  • #5: 50% (or higher) preclinical research could not be reproduced with a cost of $28B/year http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165 Pistoia paper: Implementation and relevance of FAIR data principles in biopharmaceutical R&D; https://www.ncbi.nlm.nih.gov/pubmed/30690198
  • #6: https://dx.doi.org/10.2777/02999 https://publications.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1/language-en
  • #7: EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020)
  • #16: http://fairmetrics.org/ https://fairshake.cloud/?q=TCGA
  • #18: Images: http://senior-project-led-cube.wikispaces.com/ (https://creativecommons.org/licenses/by-sa/3.0/) http://opensource.org/node/688 (https://creativecommons.org/licenses/by/4.0/)