Data Harmonization for a Molecularly
Driven Health System
Warren A. Kibbe, Ph.D.
Professor, Biostats & Bioinformatics
Chief Data Officer, Duke Cancer Institute
warren.kibbe@duke.edu
@wakibbe
#DataSharing
#LearningHealthSystem
#DataHarmonization
Sections
• Learning Health Systems
• Data Commons
• Data Harmonization
The World is Changing
• Pace of Commercialization
• Reach of Markets
• Role of Data
• Change in Healthcare
• Change in Computing
• Societal Changes
Is the US able to keep up?
R&D By Country
US R&D Funding as share of GDP
R&D spending / STEM
How do we continue to innovate?
Data Science
Twitter impacts science
Eric Topol
Changes in Computing
• Converged devices
• Converged IT
• Ubiquity of devices, data, mHealth
2017200220072012
10/23/2001
(~5yrsold)
1/9/2007
(~10yrsold)
iPod(10GBmax)
iPhone(EDGE,16GBmax)
9/16/1999
(~3yrsold)
802.11bWiFi
4/3/2010
(~13yrsold)
iPad(EDGE,64GBmax)
4/23/2005
(~8yrsold)
9/26/2006
(~9yrsold)
7/15/2006
2/7/2007
Google
Drive
4/24/2012
(~15yrsold)7/11/2008
(~11yrsold)
iPhone3G
(16GBmax)
9/12/2012
(~15yrsold)
iPhone5(LTE,128GBmax)
Google
Baseline
3/9/2015
(~18yrsold)
Apple
ResearchKit
HTCVRHeadset
4/5/2016
(~19yrsold)
7/14/2014
(~17yrsold)
NextGen
Courtesy of Jerry Lee, NCI
Changes in Technology
Pace of Technology Adoption
Changes in Commercialization
Changes in Oncology
• Cancer is a grand challenge
• Anatomic vs molecular classification
• Health vs Disease
Understanding Cancer
• Precision medicine will lead to fundamental
understanding of the complex interplay between
genetics, epigenetics, nutrition, environment and
clinical presentation and direct effective,
evidence-based prevention and treatment.
Ramifications across many aspects of health care
IOM
(Now NAM)
Report
2006-11
NAM
Workshops
“Science, informatics, incentives, and
culture are aligned for continuous
improvement and innovation, with best
practices seamlessly embedded in the
delivery process and new knowledge
captured as an integral by-product of the
delivery experience.”
—Institute of Medicine
LEARNING HEALTH SYSTEMS
Another imperative is that such systems
do their work:
• Transparently (how does one learn
without well documented processes?)
• Reproducibly (good practices must
always be repeatable at scale and
scientifically reproducible)
• Only with the above can the science in
“data science” be done with sufficient
rigor
LEARNING HEALTH SYSTEMS
ASSEMBLE
ANALYZE
INTERPRET
FEEDBACK
CHANGE
LEARNING HEALTH SYSTEMS
Learning Health Systems in NEJM
Goals
• Contain rising cost of healthcare
• Maximize the value of care
• Increase public discourse and
marketplace for healthcare
Drivers
• Decision Making is too complex
• Clinical decisions are based on
practice, not evidence
• Inefficiency and waste in healthcare
Human cognitive capacity is constant
Lack of Evidence
EHRs and the Learning Health System
LHS definition
Problems for LHS to solve
Inefficient Healthcare
Poor Health in spite of high expenditures
Curve hasn’t improved
2015
View from 2006
EHRs are now ubiquitous
But evidence-driven decision support
remains a future vision
Hope
Cloud computing, data commons,
service-based computing provide some
powerful tools for solving data access,
data analysis, data analytics, and data
visualization problems at scale,
securely.
Sebastian Thrun
So what is a Data Commons
Commons Topology
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
Commons Compliance
• Treat products of research – data,
methods, papers etc. as digital
objects
• These digital objects exist in a
shared virtual space
• Digital object compliance through
FAIR principles:
– Findable
– Accessible (and usable)
– Interoperable
– Reusable
Data Sharing and the FAIR Principles
FAIR –
Making data
Findable,
Accessible,
Attributable,
Interoperable,
Reusable,
and provide Recognition
Force11 white paper
https://www.force11.org/group/fairgroup/fairprinciples
“The Commons is an effort at
creating a sharing economy and
for building community. We hope
for a more cost effective and
productive research
environment while bringing
people together in a unique
way.“
Phil Bourne
44
Blue Ribbon Panel Report
Cancer Moonshot℠ Blue Ribbon Panel
“The Cancer Moonshot Task Force was
directed to consult with external experts
from relevant scientific sectors, including
the presidentially appointed National
Cancer Advisory Board(NCAB).
A Blue Ribbon Panel of scientific experts
was created to advise the NCAB.”
Vision:
Enable the creation of a Learning Healthcare System for
Cancer, where as a nation we learn from the contributed
knowledge and experience of every cancer patient. As
part of the Cancer Moonshot, we want to unleash the
power of data to enhance, improve, and inform the journey
of every cancer patient from the point of diagnosis
through survivorship.
A National Cancer Data Ecosystem
Cancer Research
Data Commons
SBG CGC
Broad FireCloud ISB CGC
Courtesy NCI-CBIIT
Data Commons Framework – What Is It?
47
Modular Components
Secure user authentication and authorization
Metadata validation and tools
Domain-specific, extensible data models and dictionaries
API and container environment for tools and pipelines
Access to computational workspaces for storing data, tools, and
results
Reusable, expandable
framework for a Data
Commons
Core principles and
structures for a Data
Commons
Set of modular
components that can be
leveraged across Data
Commons
Narrow Middle Architecture (End-to-End Design)
1. AuthN / AuthZ
2. Metadata validation
3. Extensible data model
4. APIs for containers, workflows & tools
5. Workspaces
science outdata in
Courtesy Bob
Grossman, U. Chicago
49
NCI Cancer Research Data Commons (CRDC) - Concept
NCI Scope: “Create a data
science infrastructure necessary
to connect repositories, analytical
tools, and knowledge bases”
Data commons co-locate data,
storage and computing
infrastructure with commonly
used services, tools & apps for
analyzing and sharing data to
create an interoperable resource
for the research community.*
*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A
Case for Data Commons Towards Data Science as a Service, IEEE Computing in Science
and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons
infrastructure at the University of Chicago Kenwood Data Center.
50
Data Commons Framework
Clinical Proteomics ImagingGenomics Immuno-
oncology
Animal Models Cancer Biomarkers
NCI Cancer Research
Data Commons
SBG CGC
Broad FireCloud ISB CGC
Elastic
Compute
Query
Visualization
Clinical Proteomics Tumor
Analysis Consortium*
Tool
Deployment
The Cancer Imaging Archive*
TCIA
Web
Interface
APIs Data
Submission
Authentication
& Authorization
Authentication
& Authorization
Data Models &
Dictionaries
Computational
Workspaces
Data Contributors and Consumers
Tool
Repositories
Metadata
Validation
& Tools
Analysis
Courtesy NCI-CBIIT
Gen3 Data Commons
Gen3 Data Commons
Gen3 Data Commons
NCI Genomic Data Commons
NCI Genomic Data Commons
NCI Genomic Data Commons
Data Harmonization
• The process of semantic and
syntactic mapping of data to a set of
definitions, predefined data
elements, data model.
• Validation and Harmonization of
primary and secondary data is crucial
to enable analysis and reuse
Spanning the Semantic Chasm of Despair
Building a Translational Bridge
CD2H
Thanks to Melissa Haendel
Project Highlight: Harmonizing clinical data models
Sentinel
I2b2/ACT
OMOP
PCORNET
▪ Different countries use different “outlets”.
▪ There is a need for travel adapters.
The Solution:
▪ Use a converter between various adapters.
▪ Allow researchers to ask a question once and
receive results from many different sources
Project Highlight: LOINC2HPO
◆ Develop a software tool to map
LOINC codes to HPO terms
◆ Develop software to convert
EHR observations into HPO
terms for use in clinical
research
Steps
Develop a tool for converting LOINC laboratory codes and values into more
phenotypically meaningful language (Human Phenotype Ontology) to allow for
translational interoperability and new analytics
2657-5 “Nitrite [Mass/volume] in Urine” Numeric
20407-3 “Nitrite [Mass/volume] in Urine by Test
strip”
Numeric
32710-6 “Nitrite [Presence] in Urine” Positive/Negati
ve
5802-4 “Nitrite [Presence] in Urine by Test strip” Positive/Negati
ve
50558-6 “Nitrite [Presence] in Urine by
Automated test strip
Positive/Negati
ve
LOINC Outcome
HPO: Nitrituria
INSERT CDE Browser Screenshot?
CIBMTR Center for Cancer
Research
Over 35 NCI Programs, Plus
Cancer Centers and Consortia
GDC
Data Sharing Index
• We need metrics for data, software,
algorithm use, usability, conformance
• Data sharing stimulates science,
innovation, commercialization
• Providing recognition and attribution
to data providers and software &
algorithm builders is critical for a
robust data sharing ecosystem
• Support and measure FAIRness!
Questions?
Warren Kibbe, Ph.D.
warren.kibbe@duke.edu
@wakibbe

More Related Content

PPTX
Data Harmonization for a Molecularly Driven Health System
PPTX
The Commons: Leveraging the Power of the Cloud for Big Data
PPT
Big Data in Biomedicine – An NIH Perspective
PPTX
Data supporting precision oncology fda wakibbe
PPT
Open Data in a Global Ecosystem
PPT
The NIH as a Digital Enterprise: Implications for PAG
PDF
Fair by design
PPTX
EMBL Australian Bioinformatics Resource AHM - Data Commons
Data Harmonization for a Molecularly Driven Health System
The Commons: Leveraging the Power of the Cloud for Big Data
Big Data in Biomedicine – An NIH Perspective
Data supporting precision oncology fda wakibbe
Open Data in a Global Ecosystem
The NIH as a Digital Enterprise: Implications for PAG
Fair by design
EMBL Australian Bioinformatics Resource AHM - Data Commons

What's hot (20)

PDF
Some Frameworks for Improving Analytic Operations at Your Company
PPTX
Big Data as a Catalyst for Collaboration & Innovation
PDF
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
PPT
The Vision for Data @ the NIH
PDF
Darwin ai covid-net mitre
PDF
What is Data Commons and How Can Your Organization Build One?
PPTX
SWOT Analysis - What Does it Tell Us?
PPTX
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
PDF
Some Proposed Principles for Interoperating Cloud Based Data Platforms
PDF
A Data Biosphere for Biomedical Research
PPTX
Introduction to Big Data and its Potential for Dementia Research
PPTX
The Future of FAIR Data: An international social, legal and technological inf...
PPTX
Bioinformatics in the Era of Open Science and Big Data
PPTX
Understanding the Big Data Enterprise
PDF
Trust threads: Provenance for Data Reuse in Long Tail Science
PDF
Data Virtualization Modernizes Biobanking
PDF
A Gen3 Perspective of Disparate Data
PDF
Hadoop and Data Virtualization - A Case Study by VHA
PDF
Trust threads : Active Curation and Publishing in SEAD
PPT
Meeting the Computational Challenges Associated with Human Health
Some Frameworks for Improving Analytic Operations at Your Company
Big Data as a Catalyst for Collaboration & Innovation
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
The Vision for Data @ the NIH
Darwin ai covid-net mitre
What is Data Commons and How Can Your Organization Build One?
SWOT Analysis - What Does it Tell Us?
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
Some Proposed Principles for Interoperating Cloud Based Data Platforms
A Data Biosphere for Biomedical Research
Introduction to Big Data and its Potential for Dementia Research
The Future of FAIR Data: An international social, legal and technological inf...
Bioinformatics in the Era of Open Science and Big Data
Understanding the Big Data Enterprise
Trust threads: Provenance for Data Reuse in Long Tail Science
Data Virtualization Modernizes Biobanking
A Gen3 Perspective of Disparate Data
Hadoop and Data Virtualization - A Case Study by VHA
Trust threads : Active Curation and Publishing in SEAD
Meeting the Computational Challenges Associated with Human Health
Ad

Similar to Data Harmonization for a Molecularly Driven Health System (20)

PPTX
Hadoop Enabled Healthcare
PPTX
Starting the Hadoop Journey at a Global Leader in Cancer Research
PPTX
Starting the Hadoop Journey at a Global Leader in Cancer Research
PDF
Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204
PPTX
Bonazzi commons bd2 k ahm 2016 v2
PDF
Building an Intelligent Biobank to Power Research Decision-Making
PPTX
NIH Data Summit - The NIH Data Commons
PPTX
Role of data in precision oncology
PPTX
BD2K and the Commons : ELIXR All Hands
PPTX
Big Data at Geisinger Health System: Big Wins in a Short Time
PPTX
Data commons bonazzi bd2 k fundamentals of science feb 2017
PDF
Cri big data
PDF
2015 04-18-wilson cg
PPTX
Opportunities for HPC in pharma R&D - main deck
PPTX
Will Biomedical Research Fundamentally Change in the Era of Big Data?
PDF
Toward a FAIR Biomedical Data Ecosystem
PDF
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
PPTX
The Science of Data Science
PPTX
Being FAIR: Enabling Reproducible Data Science
PPTX
Building a Network of Interoperable and Independently Produced Linked and Ope...
Hadoop Enabled Healthcare
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204
Bonazzi commons bd2 k ahm 2016 v2
Building an Intelligent Biobank to Power Research Decision-Making
NIH Data Summit - The NIH Data Commons
Role of data in precision oncology
BD2K and the Commons : ELIXR All Hands
Big Data at Geisinger Health System: Big Wins in a Short Time
Data commons bonazzi bd2 k fundamentals of science feb 2017
Cri big data
2015 04-18-wilson cg
Opportunities for HPC in pharma R&D - main deck
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Toward a FAIR Biomedical Data Ecosystem
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
The Science of Data Science
Being FAIR: Enabling Reproducible Data Science
Building a Network of Interoperable and Independently Produced Linked and Ope...
Ad

More from Warren Kibbe (20)

PPTX
CCDI Kibbe Wake Forest University Dec 2023.pptx
PPTX
Big Data Training for Cancer Research, Purdue, May 2023
PPTX
CCDI Overview November 2022
PPTX
RADx-UP CDCC Overview November 2022
PPTX
CCDI Kibbe Big Data Training May 2022
PPTX
Real world data, the National COVID-19 Cohort Consortium, and Oncology 2021
PPTX
Childhood Cancer Data Initiative presentation to the Children’s Brain Tumor N...
PPTX
RADx-UP CDCC presentation for the NIH Disaster Interest Group
PPTX
DCHI webinar on N3C January 2021
PPTX
NCATS CTSA N3C
PPTX
NAACCR June 2020
PPTX
NCI HTAN, cancer trajectories, precision oncology
PPTX
ENAR 2020
PPTX
ENAR 2020
PPTX
Technology and connected health for population science kibbe duke jan 2020
PPTX
Super computing 19 Cancer Computing Workshop Keynote
PPTX
Data sharing Webinar March 2019
PPTX
Data in precision oncology SAMSI Precision Medicine Meeting mar 2019
PPTX
Opportunities for computing in cancer research
PPTX
Opportunities in technology and connected health for population science
CCDI Kibbe Wake Forest University Dec 2023.pptx
Big Data Training for Cancer Research, Purdue, May 2023
CCDI Overview November 2022
RADx-UP CDCC Overview November 2022
CCDI Kibbe Big Data Training May 2022
Real world data, the National COVID-19 Cohort Consortium, and Oncology 2021
Childhood Cancer Data Initiative presentation to the Children’s Brain Tumor N...
RADx-UP CDCC presentation for the NIH Disaster Interest Group
DCHI webinar on N3C January 2021
NCATS CTSA N3C
NAACCR June 2020
NCI HTAN, cancer trajectories, precision oncology
ENAR 2020
ENAR 2020
Technology and connected health for population science kibbe duke jan 2020
Super computing 19 Cancer Computing Workshop Keynote
Data sharing Webinar March 2019
Data in precision oncology SAMSI Precision Medicine Meeting mar 2019
Opportunities for computing in cancer research
Opportunities in technology and connected health for population science

Recently uploaded (20)

PPTX
Indications for Surgical Delivery...pptx
PPTX
presentation on causes and treatment of glomerular disorders
PDF
The Digestive System Science Educational Presentation in Dark Orange, Blue, a...
PDF
Gynecologic Malignancies.Dawit.pdf............
PDF
New-Child for VP Shunt Placement – Anaesthetic Management - Copy (1).pdf
PPTX
Sanitation and public health for urban regions
PPTX
Congenital Anomalies of Eyelids and Orbit
PPT
fiscal planning in nursing and administration
PPTX
DIARRHOEA IN CHILDREN presented to COG.ppt
PPTX
Assessment of fetal wellbeing for nurses.
PPTX
Hypertensive disorders in pregnancy.pptx
PPTX
Local Anesthesia Local Anesthesia Local Anesthesia
PPTX
Peripheral Arterial Diseases PAD-WPS Office.pptx
PPTX
management and prevention of high blood pressure
PPTX
Biostatistics Lecture Notes_Dadason.pptx
PPTX
Pharynx and larynx -4.............pptx
PDF
MNEMONICS MNEMONICS MNEMONICS MNEMONICS s
PDF
Glaucoma Definition, Introduction, Etiology, Epidemiology, Clinical Presentat...
PPTX
NUCLEAR-MEDICINE-Copy.pptxbabaabahahahaahha
PDF
Geriatrics Chapter 1 powerpoint for PA-S
Indications for Surgical Delivery...pptx
presentation on causes and treatment of glomerular disorders
The Digestive System Science Educational Presentation in Dark Orange, Blue, a...
Gynecologic Malignancies.Dawit.pdf............
New-Child for VP Shunt Placement – Anaesthetic Management - Copy (1).pdf
Sanitation and public health for urban regions
Congenital Anomalies of Eyelids and Orbit
fiscal planning in nursing and administration
DIARRHOEA IN CHILDREN presented to COG.ppt
Assessment of fetal wellbeing for nurses.
Hypertensive disorders in pregnancy.pptx
Local Anesthesia Local Anesthesia Local Anesthesia
Peripheral Arterial Diseases PAD-WPS Office.pptx
management and prevention of high blood pressure
Biostatistics Lecture Notes_Dadason.pptx
Pharynx and larynx -4.............pptx
MNEMONICS MNEMONICS MNEMONICS MNEMONICS s
Glaucoma Definition, Introduction, Etiology, Epidemiology, Clinical Presentat...
NUCLEAR-MEDICINE-Copy.pptxbabaabahahahaahha
Geriatrics Chapter 1 powerpoint for PA-S

Data Harmonization for a Molecularly Driven Health System

  • 1. Data Harmonization for a Molecularly Driven Health System Warren A. Kibbe, Ph.D. Professor, Biostats & Bioinformatics Chief Data Officer, Duke Cancer Institute [email protected] @wakibbe #DataSharing #LearningHealthSystem #DataHarmonization
  • 2. Sections • Learning Health Systems • Data Commons • Data Harmonization
  • 3. The World is Changing • Pace of Commercialization • Reach of Markets • Role of Data • Change in Healthcare • Change in Computing • Societal Changes
  • 4. Is the US able to keep up?
  • 6. US R&D Funding as share of GDP
  • 8. How do we continue to innovate?
  • 12. Changes in Computing • Converged devices • Converged IT • Ubiquity of devices, data, mHealth
  • 14. Pace of Technology Adoption
  • 16. Changes in Oncology • Cancer is a grand challenge • Anatomic vs molecular classification • Health vs Disease
  • 17. Understanding Cancer • Precision medicine will lead to fundamental understanding of the complex interplay between genetics, epigenetics, nutrition, environment and clinical presentation and direct effective, evidence-based prevention and treatment. Ramifications across many aspects of health care
  • 20. “Science, informatics, incentives, and culture are aligned for continuous improvement and innovation, with best practices seamlessly embedded in the delivery process and new knowledge captured as an integral by-product of the delivery experience.” —Institute of Medicine LEARNING HEALTH SYSTEMS
  • 21. Another imperative is that such systems do their work: • Transparently (how does one learn without well documented processes?) • Reproducibly (good practices must always be repeatable at scale and scientifically reproducible) • Only with the above can the science in “data science” be done with sufficient rigor LEARNING HEALTH SYSTEMS
  • 24. Goals • Contain rising cost of healthcare • Maximize the value of care • Increase public discourse and marketplace for healthcare
  • 25. Drivers • Decision Making is too complex • Clinical decisions are based on practice, not evidence • Inefficiency and waste in healthcare
  • 28. EHRs and the Learning Health System
  • 30. Problems for LHS to solve
  • 32. Poor Health in spite of high expenditures
  • 34. 2015
  • 36. EHRs are now ubiquitous But evidence-driven decision support remains a future vision
  • 37. Hope Cloud computing, data commons, service-based computing provide some powerful tools for solving data access, data analysis, data analytics, and data visualization problems at scale, securely.
  • 39. So what is a Data Commons
  • 40. Commons Topology Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface PaaS SaaS IaaS https://datascience.nih.gov/commons
  • 41. Commons Compliance • Treat products of research – data, methods, papers etc. as digital objects • These digital objects exist in a shared virtual space • Digital object compliance through FAIR principles: – Findable – Accessible (and usable) – Interoperable – Reusable
  • 42. Data Sharing and the FAIR Principles FAIR – Making data Findable, Accessible, Attributable, Interoperable, Reusable, and provide Recognition Force11 white paper https://www.force11.org/group/fairgroup/fairprinciples
  • 43. “The Commons is an effort at creating a sharing economy and for building community. We hope for a more cost effective and productive research environment while bringing people together in a unique way.“ Phil Bourne
  • 44. 44 Blue Ribbon Panel Report Cancer Moonshot℠ Blue Ribbon Panel “The Cancer Moonshot Task Force was directed to consult with external experts from relevant scientific sectors, including the presidentially appointed National Cancer Advisory Board(NCAB). A Blue Ribbon Panel of scientific experts was created to advise the NCAB.”
  • 45. Vision: Enable the creation of a Learning Healthcare System for Cancer, where as a nation we learn from the contributed knowledge and experience of every cancer patient. As part of the Cancer Moonshot, we want to unleash the power of data to enhance, improve, and inform the journey of every cancer patient from the point of diagnosis through survivorship.
  • 46. A National Cancer Data Ecosystem Cancer Research Data Commons SBG CGC Broad FireCloud ISB CGC Courtesy NCI-CBIIT
  • 47. Data Commons Framework – What Is It? 47 Modular Components Secure user authentication and authorization Metadata validation and tools Domain-specific, extensible data models and dictionaries API and container environment for tools and pipelines Access to computational workspaces for storing data, tools, and results Reusable, expandable framework for a Data Commons Core principles and structures for a Data Commons Set of modular components that can be leveraged across Data Commons
  • 48. Narrow Middle Architecture (End-to-End Design) 1. AuthN / AuthZ 2. Metadata validation 3. Extensible data model 4. APIs for containers, workflows & tools 5. Workspaces science outdata in Courtesy Bob Grossman, U. Chicago
  • 49. 49 NCI Cancer Research Data Commons (CRDC) - Concept NCI Scope: “Create a data science infrastructure necessary to connect repositories, analytical tools, and knowledge bases” Data commons co-locate data, storage and computing infrastructure with commonly used services, tools & apps for analyzing and sharing data to create an interoperable resource for the research community.* *Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at the University of Chicago Kenwood Data Center.
  • 50. 50 Data Commons Framework Clinical Proteomics ImagingGenomics Immuno- oncology Animal Models Cancer Biomarkers NCI Cancer Research Data Commons SBG CGC Broad FireCloud ISB CGC Elastic Compute Query Visualization Clinical Proteomics Tumor Analysis Consortium* Tool Deployment The Cancer Imaging Archive* TCIA Web Interface APIs Data Submission Authentication & Authorization Authentication & Authorization Data Models & Dictionaries Computational Workspaces Data Contributors and Consumers Tool Repositories Metadata Validation & Tools Analysis Courtesy NCI-CBIIT
  • 54. NCI Genomic Data Commons
  • 55. NCI Genomic Data Commons
  • 56. NCI Genomic Data Commons
  • 57. Data Harmonization • The process of semantic and syntactic mapping of data to a set of definitions, predefined data elements, data model. • Validation and Harmonization of primary and secondary data is crucial to enable analysis and reuse
  • 58. Spanning the Semantic Chasm of Despair Building a Translational Bridge CD2H Thanks to Melissa Haendel
  • 59. Project Highlight: Harmonizing clinical data models Sentinel I2b2/ACT OMOP PCORNET ▪ Different countries use different “outlets”. ▪ There is a need for travel adapters. The Solution: ▪ Use a converter between various adapters. ▪ Allow researchers to ask a question once and receive results from many different sources
  • 60. Project Highlight: LOINC2HPO ◆ Develop a software tool to map LOINC codes to HPO terms ◆ Develop software to convert EHR observations into HPO terms for use in clinical research Steps Develop a tool for converting LOINC laboratory codes and values into more phenotypically meaningful language (Human Phenotype Ontology) to allow for translational interoperability and new analytics 2657-5 “Nitrite [Mass/volume] in Urine” Numeric 20407-3 “Nitrite [Mass/volume] in Urine by Test strip” Numeric 32710-6 “Nitrite [Presence] in Urine” Positive/Negati ve 5802-4 “Nitrite [Presence] in Urine by Test strip” Positive/Negati ve 50558-6 “Nitrite [Presence] in Urine by Automated test strip Positive/Negati ve LOINC Outcome HPO: Nitrituria
  • 61. INSERT CDE Browser Screenshot? CIBMTR Center for Cancer Research Over 35 NCI Programs, Plus Cancer Centers and Consortia GDC
  • 62. Data Sharing Index • We need metrics for data, software, algorithm use, usability, conformance • Data sharing stimulates science, innovation, commercialization • Providing recognition and attribution to data providers and software & algorithm builders is critical for a robust data sharing ecosystem • Support and measure FAIRness!