Tom Plasterer, PhD.
integrated informatics Semantic Framework Lead (i2SF)
The Path to Linked Data in
BioPharma
Integrated R&D Informatics and Knowledge Management
R&D | RDI
Blockbuster ‘Patent Cliff’ Gives Way to Personalized Approach
Drivers & Solutions
Blockbuster
Patent Cliff
Growth of
Generics
Mergers &
Acquisitions
Personalized
Medicine
•Pharmacogenetics
•Biomarkers
American Action Forum; Primer: The Pharmaceutical Industry (Han Zhong l Updated June 2012)
IMAP Pharma & Biotech Industry Global Report 2011
Evaluate Pharma World Preview 2018From: http://www.liv.ac.uk/pharmacogenetics/
R&D | RDI
• Nurture ‘best in class’ programs
• Kill early
• Repositioning
Build from within
• Partner or Buy?
• Integrate cultures & technology
• Is the disruption worth it?
Mergers &
Acquisitions
• How much can be shared—and still be useful?
• Who is driving?
Pre-Competitive
Consortiums
• Aggressive Regional Partnerships (Pfizer's Centers for
Therapeutic Innovation)
• Co-locate near Academic Centers of Excellence (Novartis)
• Cherry pick (GSK, AZ, others)
Finding ‘KOLs’
Where do the new opportunities arise?
Inside & Outside
R&D | RDI
Distributed Data in a Monolithic Environment
Managing Silos
•Regulated Systems vs. Discovery
Partitioned By Content
•US, EU, ASIAPAC
Partitioned By Geography & Organization
•RDB, Excel, Text, RSS, RDF?
Data Formats
•Steps in the right direction?
Warehouses & Service Oriented
Architecture
•eRooms, Sharepoint,Yammer, ‘Lync’ vs. Twitter, Google
Docs, Skype
Collaborative Environment
•Vendor specific or open?
•Mixed BagStandards?
•UI? Services?
•Metadata?Where are the ‘smarts’
R&D | RDI
Requirements of The Informatics Landscape
 Must span the entire drug development lifecycle
o and back (post-market surveillance to discovery)
 Must support large and very heterogeneous data
o single nucleotide polymorphisms to countries
 Will change as new science emerges & new regulations come into play
o Medline just under 1M articles/year
 Must be able to work with multiple, international regulatory bodies
o Emerging markets
 Partners, customers and collaborators will change
o and will have divergent technical aptitudes
 Must be able to interoperated with precompetitive consortia
o Can they perform common tasks for the community
 Must be able to work with legacy data
o Lots of unmined gems here!
Maximal Agility
R&D | RDI
What’s Needed?
Linked Data!
http://thedatahub.org/group/lodcloud
LOD Cloud 2011
R&D | RDI
The 5 Stars of Open Linked Data
W3C/TBL Guidance
7
http://www.w3.org/DesignIssues/LinkedData.html
★ Make your stuff available on the web (any
format)
★★ make it available as structured data (e.g. Excel
instead of image scan of a table)
★★★ Use a non-proprietary format (e.g. CSV instead
of Excel)
★★★★ Use URLs to identify things, so that people can
point at your stuff
★★★★★ Link your data to other people’s data to provide
context
R&D | RDI
The 5 Stars of Open ClosedLinked Data
8
http://www.w3.org/DesignIssues/LinkedData.html
★ Make your stuff available on the web intranet
(any format)
★★ make it available as structured data (e.g. Excel
instead of image scan of a table)
★★★ Use a non-proprietary format (e.g. CSV instead
of Excel)
★★★★ Use URLs to identify things, so that people can
point at your stuff
★★★★★ Link your data to other people’s data to provide
context
W3C/TBL Guidance
Catalogues, Mapping, Queries
RDF
Towards a Linked Data Architecture
9
Active & Partial PURLs
Central Identity
Management
Structured
Triplestores
http://research.vocab.astrazeneca.com/id/DOID/2841 http://humandiseaseontology.astrazeneca.net/DOID/2841
Semantic
Visualization
Semi-StructuredUnstructured
Content
+Tagging
Vocabulary
Server
Search
R&D | RDI
Choosing Linked Vocabularies
Current LOD Cloud Adoption
10
Vocabulary prefix Vocabulary link
Number of
usages in data
sets
dc http://purl.org/dc/elements/1.1/ 92 (31.19 %)
foaf http://xmlns.com/foaf/0.1/ 81 (27.46 %)
skos http://www.w3.org/2004/02/skos/core# 58 (19.66 %)
geo http://www.w3.org/2003/01/geo/wgs84_pos# 25 (8.47 %)
xhtml http://www.w3.org/1999/xhtml/vocab# 19 (6.44 %)
akt http://www.aktors.org/ontology/portal# 17 (5.76 %)
bibo http://purl.org/ontology/bibo/ 14 (4.75 %)
mo http://purl.org/ontology/mo/ 13 (4.41 %)
vcard http://www.w3.org/2006/vcard/ns# 10 (3.39 %)
sioc http://rdfs.org/sioc/ns# 10 (3.39 %)
cc http://creativecommons.org/ns# 8 (2.71 %)
geonames http://www.geonames.org/ontology# 6 (2.03 %)
http://www4.wiwiss.fu-berlin.de/lodcloud/state/#terms
Vocabulary
Server
R&D | RDI
The 5 Stars of Open Linked Vocabularies
Bernard Vatant (Mondeca) Guidance
11 http://blog.hubjects.com/2012/02/is-your-linked-data-vocabulary-5-star_9588.html
★ Publish your vocabulary on the Web at a stable
URI
★★ Provide human-readable documentation and
basic metadata (e.g. creator, publisher, date of
creation, last modification, version number)
★★★ Provide labels and descriptions, if possible in
several languages, to make your vocabulary
usable in multiple linguistic scopes
★★★★ Make your vocabulary available via its
namespace URI, both as a formal file and
human-readable documentation, using content
negotiation
★★★★★ Link to other vocabularies by re-using elements
rather than re-inventing
R&D | RDI
Domain Specific Vocabularies
Linked Open Vocabularies, NCBO
12
http://labs.mondeca.com/dataset/lov/index.html
http://bioportal.bioontology.org/
Capture Business
Questions and
Sources
Domain Expert
Concept Map
Build Formal
Ontology
•Reuse Vocabularies!
Challenge with
Linked Data
Model Business
Questions
(SPARQL)
Interact with RDF
answer in a
Faceted Browser
Building Linked Data Applications
Improving Internal Interoperability
Scientists, Clinicians, Informaticists can now freely interoperate as:
The PURL server provides a central identity management authority for
resources that are of value (need to persist) across the enterprise.
The Persistent URLs are used to connect resources found in multiple
locations
The vocabulary server provides a way of harmonizing concepts across
different domains
o Where possible, public vocabularies are used
o Where not, they’re extended
o We don’t want to develop and maintain vocabularies
R&D | RDI
Structured
Vendor Content
Consortium Content
RESTful
APIs
Catalogues, Mapping, Queries
RDF
Structured
Triplestores
Semi-StructuredUnstructured
Content
+Tagging
Inside/Outside Disappears
15
 External Internal
Active & Partial PURLs
Central Identity
Management
Semantic
Visualization
Vocabulary
Server
R&D | RDI
Unstructured Content
Giving Structure to Unstructured Content
o Entity Recognition
o Use of common vocabularies
o Schemas
o Domain-Specific Content? Open BEL? TMO?
o Compatibility of text indices with triplestores & middleware tools
Encouraging Publishers to Structure Content
o How can this be ‘monetized’ so they don’t lose their ROI?
o What about interoperability & persistence?
o Can this be mandated via funding agencies
o RDFa to start?
Publishers or ‘Re-publishers’
o Thomson-Reuters
o Ingenuity
o Open up vocabularies
(or most of the data out there…)
R&D | RDI
Pre-Competitive Consortia
Open PHACTS (Innovative Medicines Initiative)
Pistoia Alliance
W3C Health Care & Life Sciences Interest Group
National Center for Biomedical Ontologies
(NCBO)
Open BEL (Biological Expression Language)
R&D | RDI
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
18
Open PHACTS (Open Pharmacological Space)
• EU/EFPIA Innovative Medicines Initiative (IMI) project
From: Open PHACTS Architecture - Building the extensible platform (EuroQSAR 2012 in Vienna, 30.08.2012)
R&D | RDI
W3C HCLS
Activities:
o Continue to develop high level (e.g. TMO) and architectural (e.g. SWAN)
vocabularies.
o Implement proof-of-concept demonstrations and industry-ready code.
o Document guidelines to accelerate the adoption of the technology.
o Disseminate information about the group's work at government, industry, academic
events and by participating in community initiatives.
Use Cases/Domains
o Drug Discovery
o Electronic Lab Notebooks
o Comparator Arm Data
o Patient Data Ownership
o Biotech Acquisition
o Supply Chain Automation
o Web Integration
o Bio-surveillance
o Co-development
http://www.w3.org/blog/hcls/
The mission of the Semantic Web Health Care and Life Sciences
Interest Group (HCLS IG) is to develop, advocate for, and
support the use of Semantic Web technologies across health
care, life sciences, clinical research and translational medicine
R&D | RDI
Pleas & Future Directions
Prognostications
RDF Content Farms
Vendors: Someone will figure out
how to monetize this
Consortia: Who ‘Owns’ this?
Government in Health Care & Life
Sciences; can we learn from the
EPA? open.gov?
Shrinking Pharma
Smaller (or virtual) footprint
o Back to first principles—what do
we do best?
More modeling & Simulation
Rise of the informaticist…
Community Help
Resist Silos
Where is your data? Where is it likely
to be in 5, 10 years?
A single triplestore with all ETL-
streams leading to an RDF ‘data
warehouse’ is another silo
o Building on top of ‘standards+’ may
lead to silos
Need to follow & influence emergence
of standards if you have a ‘horse in the
race’
Support (business focused) Consortiums
We’re doing the same job many, many
times
Thank You
Listeners &
Molecular Med TRI-CON
2013 Organizers

Linked Data for Biopharma

  • 1.
    Tom Plasterer, PhD. integratedinformatics Semantic Framework Lead (i2SF) The Path to Linked Data in BioPharma Integrated R&D Informatics and Knowledge Management
  • 2.
    R&D | RDI Blockbuster‘Patent Cliff’ Gives Way to Personalized Approach Drivers & Solutions Blockbuster Patent Cliff Growth of Generics Mergers & Acquisitions Personalized Medicine •Pharmacogenetics •Biomarkers American Action Forum; Primer: The Pharmaceutical Industry (Han Zhong l Updated June 2012) IMAP Pharma & Biotech Industry Global Report 2011 Evaluate Pharma World Preview 2018From: http://www.liv.ac.uk/pharmacogenetics/
  • 3.
    R&D | RDI •Nurture ‘best in class’ programs • Kill early • Repositioning Build from within • Partner or Buy? • Integrate cultures & technology • Is the disruption worth it? Mergers & Acquisitions • How much can be shared—and still be useful? • Who is driving? Pre-Competitive Consortiums • Aggressive Regional Partnerships (Pfizer's Centers for Therapeutic Innovation) • Co-locate near Academic Centers of Excellence (Novartis) • Cherry pick (GSK, AZ, others) Finding ‘KOLs’ Where do the new opportunities arise? Inside & Outside
  • 4.
    R&D | RDI DistributedData in a Monolithic Environment Managing Silos •Regulated Systems vs. Discovery Partitioned By Content •US, EU, ASIAPAC Partitioned By Geography & Organization •RDB, Excel, Text, RSS, RDF? Data Formats •Steps in the right direction? Warehouses & Service Oriented Architecture •eRooms, Sharepoint,Yammer, ‘Lync’ vs. Twitter, Google Docs, Skype Collaborative Environment •Vendor specific or open? •Mixed BagStandards? •UI? Services? •Metadata?Where are the ‘smarts’
  • 5.
    R&D | RDI Requirementsof The Informatics Landscape  Must span the entire drug development lifecycle o and back (post-market surveillance to discovery)  Must support large and very heterogeneous data o single nucleotide polymorphisms to countries  Will change as new science emerges & new regulations come into play o Medline just under 1M articles/year  Must be able to work with multiple, international regulatory bodies o Emerging markets  Partners, customers and collaborators will change o and will have divergent technical aptitudes  Must be able to interoperated with precompetitive consortia o Can they perform common tasks for the community  Must be able to work with legacy data o Lots of unmined gems here! Maximal Agility
  • 6.
    R&D | RDI What’sNeeded? Linked Data! http://thedatahub.org/group/lodcloud LOD Cloud 2011
  • 7.
    R&D | RDI The5 Stars of Open Linked Data W3C/TBL Guidance 7 http://www.w3.org/DesignIssues/LinkedData.html ★ Make your stuff available on the web (any format) ★★ make it available as structured data (e.g. Excel instead of image scan of a table) ★★★ Use a non-proprietary format (e.g. CSV instead of Excel) ★★★★ Use URLs to identify things, so that people can point at your stuff ★★★★★ Link your data to other people’s data to provide context
  • 8.
    R&D | RDI The5 Stars of Open ClosedLinked Data 8 http://www.w3.org/DesignIssues/LinkedData.html ★ Make your stuff available on the web intranet (any format) ★★ make it available as structured data (e.g. Excel instead of image scan of a table) ★★★ Use a non-proprietary format (e.g. CSV instead of Excel) ★★★★ Use URLs to identify things, so that people can point at your stuff ★★★★★ Link your data to other people’s data to provide context W3C/TBL Guidance
  • 9.
    Catalogues, Mapping, Queries RDF Towardsa Linked Data Architecture 9 Active & Partial PURLs Central Identity Management Structured Triplestores http://research.vocab.astrazeneca.com/id/DOID/2841 http://humandiseaseontology.astrazeneca.net/DOID/2841 Semantic Visualization Semi-StructuredUnstructured Content +Tagging Vocabulary Server Search
  • 10.
    R&D | RDI ChoosingLinked Vocabularies Current LOD Cloud Adoption 10 Vocabulary prefix Vocabulary link Number of usages in data sets dc http://purl.org/dc/elements/1.1/ 92 (31.19 %) foaf http://xmlns.com/foaf/0.1/ 81 (27.46 %) skos http://www.w3.org/2004/02/skos/core# 58 (19.66 %) geo http://www.w3.org/2003/01/geo/wgs84_pos# 25 (8.47 %) xhtml http://www.w3.org/1999/xhtml/vocab# 19 (6.44 %) akt http://www.aktors.org/ontology/portal# 17 (5.76 %) bibo http://purl.org/ontology/bibo/ 14 (4.75 %) mo http://purl.org/ontology/mo/ 13 (4.41 %) vcard http://www.w3.org/2006/vcard/ns# 10 (3.39 %) sioc http://rdfs.org/sioc/ns# 10 (3.39 %) cc http://creativecommons.org/ns# 8 (2.71 %) geonames http://www.geonames.org/ontology# 6 (2.03 %) http://www4.wiwiss.fu-berlin.de/lodcloud/state/#terms Vocabulary Server
  • 11.
    R&D | RDI The5 Stars of Open Linked Vocabularies Bernard Vatant (Mondeca) Guidance 11 http://blog.hubjects.com/2012/02/is-your-linked-data-vocabulary-5-star_9588.html ★ Publish your vocabulary on the Web at a stable URI ★★ Provide human-readable documentation and basic metadata (e.g. creator, publisher, date of creation, last modification, version number) ★★★ Provide labels and descriptions, if possible in several languages, to make your vocabulary usable in multiple linguistic scopes ★★★★ Make your vocabulary available via its namespace URI, both as a formal file and human-readable documentation, using content negotiation ★★★★★ Link to other vocabularies by re-using elements rather than re-inventing
  • 12.
    R&D | RDI DomainSpecific Vocabularies Linked Open Vocabularies, NCBO 12 http://labs.mondeca.com/dataset/lov/index.html http://bioportal.bioontology.org/
  • 13.
    Capture Business Questions and Sources DomainExpert Concept Map Build Formal Ontology •Reuse Vocabularies! Challenge with Linked Data Model Business Questions (SPARQL) Interact with RDF answer in a Faceted Browser Building Linked Data Applications
  • 14.
    Improving Internal Interoperability Scientists,Clinicians, Informaticists can now freely interoperate as: The PURL server provides a central identity management authority for resources that are of value (need to persist) across the enterprise. The Persistent URLs are used to connect resources found in multiple locations The vocabulary server provides a way of harmonizing concepts across different domains o Where possible, public vocabularies are used o Where not, they’re extended o We don’t want to develop and maintain vocabularies
  • 15.
    R&D | RDI Structured VendorContent Consortium Content RESTful APIs Catalogues, Mapping, Queries RDF Structured Triplestores Semi-StructuredUnstructured Content +Tagging Inside/Outside Disappears 15  External Internal Active & Partial PURLs Central Identity Management Semantic Visualization Vocabulary Server
  • 16.
    R&D | RDI UnstructuredContent Giving Structure to Unstructured Content o Entity Recognition o Use of common vocabularies o Schemas o Domain-Specific Content? Open BEL? TMO? o Compatibility of text indices with triplestores & middleware tools Encouraging Publishers to Structure Content o How can this be ‘monetized’ so they don’t lose their ROI? o What about interoperability & persistence? o Can this be mandated via funding agencies o RDFa to start? Publishers or ‘Re-publishers’ o Thomson-Reuters o Ingenuity o Open up vocabularies (or most of the data out there…)
  • 17.
    R&D | RDI Pre-CompetitiveConsortia Open PHACTS (Innovative Medicines Initiative) Pistoia Alliance W3C Health Care & Life Sciences Interest Group National Center for Biomedical Ontologies (NCBO) Open BEL (Biological Expression Language)
  • 18.
    R&D | RDI l l l l l l l l l l l l l l l l l 18 OpenPHACTS (Open Pharmacological Space) • EU/EFPIA Innovative Medicines Initiative (IMI) project From: Open PHACTS Architecture - Building the extensible platform (EuroQSAR 2012 in Vienna, 30.08.2012)
  • 19.
    R&D | RDI W3CHCLS Activities: o Continue to develop high level (e.g. TMO) and architectural (e.g. SWAN) vocabularies. o Implement proof-of-concept demonstrations and industry-ready code. o Document guidelines to accelerate the adoption of the technology. o Disseminate information about the group's work at government, industry, academic events and by participating in community initiatives. Use Cases/Domains o Drug Discovery o Electronic Lab Notebooks o Comparator Arm Data o Patient Data Ownership o Biotech Acquisition o Supply Chain Automation o Web Integration o Bio-surveillance o Co-development http://www.w3.org/blog/hcls/ The mission of the Semantic Web Health Care and Life Sciences Interest Group (HCLS IG) is to develop, advocate for, and support the use of Semantic Web technologies across health care, life sciences, clinical research and translational medicine
  • 20.
    R&D | RDI Pleas& Future Directions Prognostications RDF Content Farms Vendors: Someone will figure out how to monetize this Consortia: Who ‘Owns’ this? Government in Health Care & Life Sciences; can we learn from the EPA? open.gov? Shrinking Pharma Smaller (or virtual) footprint o Back to first principles—what do we do best? More modeling & Simulation Rise of the informaticist… Community Help Resist Silos Where is your data? Where is it likely to be in 5, 10 years? A single triplestore with all ETL- streams leading to an RDF ‘data warehouse’ is another silo o Building on top of ‘standards+’ may lead to silos Need to follow & influence emergence of standards if you have a ‘horse in the race’ Support (business focused) Consortiums We’re doing the same job many, many times
  • 21.
    Thank You Listeners & MolecularMed TRI-CON 2013 Organizers

Editor's Notes

  • #3 From 2010 through 2013, 30 blockbuster drugs with an annual sales total of approximately $98 billion have already had or will see their patents expire.The annual growth of the generic pharmaceutical industry (7.3%) is three times as high as the annual growth of the brand name pharmaceutical industry (2.4%).