SlideShare a Scribd company logo
Behind the Scenes of KnetMiner:
Towards Standardised and Interoperable
Knowledge Graphs
Harpenden, 3/6/2018

Marco Brandizi <marco.brandizi@rothamsted.ac.uk>
Find these slides on SlideShare
KnetMiner-inspired Artwork

by Hugo Dalton (hugodalton.com)
Behind the scenes of KnetMiner
Putting it on a Bigger Picture
Putting it on a Bigger Picture
<concept>
<id>1</id>
<pid>Q75WV3</pid>
<description/>
<elementOf>
<idRef>UNIPROTKB-SwissProt</idRef>
</elementOf>
<ofType>
<idRef>Protein</idRef>
</ofType>
<evidences>
<evidence>
<idRef>IMPD</idRef>
</evidence>
</evidences>
<conames>
<concept_name>
<name>Probable trehalose-phosphate phosphatase 1</name>
<isPreferred>true</isPreferred>
</concept_name>
…
<cc>
<id>Protein</id>
<fullname>Protein</fullname>
<description>
A protein is comprised of one or more Polypeptides
and potentially other molecules.
</description>
<specialisationOf>
<idRef>MolCmplx</idRef>
</specialisationOf>
</cc>
<relation>
<fromConcept>1</fromConcept>
<toConcept>3</toConcept>
<ofType>
<idRef>participates_in</idRef>
</ofType>
<evidences>
<evidence>
<idRef>ECO:0000316</idRef>
</evidence>
</evidences>
<relgds/>
</relation>
<concept>
<id>3</id>
<pid>GO:0009651</pid>
<description>response to salt stress</description>
<ofType><idRef>BioProc</idRef></ofType>
<coaccessions>
<concept_accession>
<accession>GO:0009651</accession>
<elementOf><idRef>GO</idRef></elementOf>
<ambiguous>false</ambiguous>
</concept_accession>
</coaccessions>
</concept>
Is XML/OXL Enough?
A Brief History of Data Models/Formats
The Semantic Web Approach: RDF
The Semantic Web Approach: RDF
URI Resolution
@prefix bkr: <http://www.ondex.org/bioknet/resources/> .
@prefix bk: <http://www.ondex.org/bioknet/terms/> .
@prefix bka: <http://www.ondex.org/bioknet/terms/attributes/> .
bkr:TOB1 a bk:Protein ;
bk:participates_in <http://www.wikipathways.org/id1> ;
bk:prefName "TOB1";
bk:published_in bkr:23236473.

The Turtle Syntax:
https://www.w3.org/TR/turtle/
Schema/Ontologies
Schema/Ontologies
Data store
Schema store
Schema/Ontologies
Data store
Schema store
Sharing Identifiers via URIs
Data store
Schema store
Wikipathways
Mapping Data for Interoperability
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowledge Graphs
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowledge Graphs
Our Data Model: The BioKNO Ontology
wp:id1
a bk:Path ; # a subclass of bk:Concept
bk:evidence bkev:IMPD ; # Imported from database, a predefined resource type.
bk:prefName "Bone Morphogenic Protein (BMP) Signalling and Regulation".
bkr:TOB1 a bk:Protein ;
dc:identifier bkr:TOB1_acc ;
bk:prefName "TOB1 HUMAN";

# A simplified link, hiding the BioPax chain:
# pathwayComponent -> BioChemicalReaction|Complex -> Protein
bk:participates_in wp:id1;


bk:is_annotated_by obo:GO_0030014. # Same URI as the OBO Gene Ontology Term.
# Structured accession, allow for linking of identifier and context.
bkr:TOB1_acc a bk:Accession ;
dcterms:identifier "TOB1";
# instance of bk:DataSource. Another predefined entity.
bk:dataSource bkds:UNIPROTKB.
BioKNO: Biological Entities
# For practical reasons, we always expect that the straight
# triple is always asserted, with the
# reified version optionally added to it.
bkr:TOB1 bk:published_in bkr:20068231.
bkr:citation_TOB1_15489334 a bk:Relation ;
# the same properties that are used for regular relations
bk:relTypeRef bk:published_in;
bk:relFrom bkr:TOB1 ;
bk:relTo bkr:15489334 ;
# An attribute
bka:score 0.95 ;

# Both attributes and object properties can be linked to a
# reified relation.
bk:evidence bkev:TextMining.
Attributes in Reified Relations
Talking to the Rest of The World
BioKNO External Ontologies Mapping Type
bk:Concept skos:Concept Subclass
bk:Relation
bk:relFrom
bk:relTypeRef
bk:relTo
rdf:Statement

rdf:subject
rdf:predicate
rdf:object
Subclass
Subproperties
(ie, mapping to RDF reified
statements)
bk:Path, bk:Participant, bk:Interaction, bk:Transport,
bk:Protein, bk:Gene
Classes with same names in BioPAX and SIO Equivalent Class
bk:participates_in
bk:has_participant
Relation Ontology (RO) properties with same names

biopax:participant (as sub-property)
Equivalent property
bk:produces
bk:produced_by
bk:consumes
bk:consumed_by
biopax:product (as sub-property)
RO properties with same names
Equivalent property
bk:regulates
bk:positively_regulates
bk:negatively_regulates
RO properties with same names Equivalent property
bk:is_a
bk:part_of, bk:has_part
bk:occurs_in, bk:co_occurs_with
skos:broader
Basic Formal Ontology (BFO)/RO properties with same
names
Equivalent property
bk:Publication schema:CreativeWork Subclass
bka:abstract
bka:title (also known as AbstractHeader)
bka:authors
dcterms:description
dcterms:title
dc:creator
Sub-property
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowledge Graphs
How to Serve and Query RDF?
Typical RDF (and Data) Architecture
How to Use it, Concretely?
Playground: SPARQL Browsers
How to Use it, Concretely?
Playground: SPARQL Browsers
How to Use it, Concretely?
Playground: SPARQL Browsers
How to Use it, Concretely?
Programmatically: RDF Frameworks (Jena in this case)
How to Use it, Concretely?
Programmatically: RDF Frameworks (Jena in this case)
How to Use it, Concretely?
Programmatically: RDF Frameworks (Jena in this case)
String service = "http://localhost:3030/ds/query";
String sparql =
"PREFIX bk: <http://www.ondex.org/bioknet/terms/>n" + 

…
"n" +
"n" +
"SELECT DISTINCT ?pmid ?title ?year ?pub n" +
"{n" +
" ?prot a bk:Protein;n" +
" bk:prefName 'TOB1'.n" +
" n" +
" ?pubRel a bk:Relation;n" +
" bk:relFrom ?prot;n" +
" bk:relTo ?pub;n" +
" bka:Score ?score.n" +
" n" +
" FILTER ( ?score > 0.90 )n" +
" n" +
" ?pub n" +
" bka:PMID ?pmid ;n" +
" bka:YEAR ?dyear;n" +
" bka:abstractHeader ?titlen" +
"n" +
" BIND ( xsd:int ( ?dyear ) AS ?year )n" +
"}n" +
"LIMIT 1000";
How to Use it, Concretely?
Programmatically: RDF Frameworks (Jena in this case)
String service = "http://localhost:3030/ds/query";
String sparql =
"PREFIX bk: <http://www.ondex.org/bioknet/terms/>n" + 

…
"n" +
"n" +
"SELECT DISTINCT ?pmid ?title ?year ?pub n" +
"{n" +
" ?prot a bk:Protein;n" +
" bk:prefName 'TOB1'.n" +
" n" +
" ?pubRel a bk:Relation;n" +
" bk:relFrom ?prot;n" +
" bk:relTo ?pub;n" +
" bka:Score ?score.n" +
" n" +
" FILTER ( ?score > 0.90 )n" +
" n" +
" ?pub n" +
" bka:PMID ?pmid ;n" +
" bka:YEAR ?dyear;n" +
" bka:abstractHeader ?titlen" +
"n" +
" BIND ( xsd:int ( ?dyear ) AS ?year )n" +
"}n" +
"LIMIT 1000";
Query query = QueryFactory.create ( sparql );
QueryEngineHTTP qexec = QueryExecutionFactory.createServiceRequest(
service, query
);
ResultSet results = qexec.execSelect() ;
results.forEachRemaining ( (QuerySolution soln ) ->
{
Resource pubNode = soln.getResource ( "pub" );
String uri = pubNode.getURI ();
Literal titleNode = soln.getLiteral ( "title" );
String title = titleNode.getString ();
String titleLang = titleNode.getLanguage ();
Literal yearNode = soln.getLiteral ( "year" );
int year = yearNode.getInt ();
System.out.format (
"Publication ID: <%s>, title: %s (in %s), year: %dn",
uri, title, titleLang, year
);
});
CONSTRUCT {
?path a bk:Path;
bk:prefName ?pathName;
bk:evidence bkev:IMPD.
?bkProt a bk:Protein;
dc:identifier ?bkProtAccUri;
bk:prefName ?protName;
bk:participates_in ?path.
?bkProtAccUri a bk:Accession;
dcterms:identifier ?protName;
bk:dataSource bkds:UNIPROTKB.
}
SPARQL for Extraction, Loading, Transformation
(The Simpler-than-Ondex Way)
WHERE
{
?path a bp:Pathway;
bp:displayName ?pathName;
bp:pathwayComponent ?comp.
{
?comp a bp:BiochemicalReaction;
bp:left|bp:right ?protein.
}
UNION {
?react a bp:Complex;
bp:component ?protein.
}
?protein a bp:Protein;
bp:displayName ?protName.
BIND ( IRI ( CONCAT ( STR ( bkr: ), STR ( ?protName ) ) ) AS ?bkProt )
BIND ( IRI ( CONCAT ( STR ( ?bkProt ), "_acc" ) ) AS ?bkProtAccUri )
}
CONSTRUCT {
?path a bk:Path;
bk:prefName ?pathName;
bk:evidence bkev:IMPD.
?bkProt a bk:Protein;
dc:identifier ?bkProtAccUri;
bk:prefName ?protName;
bk:participates_in ?path.
?bkProtAccUri a bk:Accession;
dcterms:identifier ?protName;
bk:dataSource bkds:UNIPROTKB.
}
SPARQL for Extraction, Loading, Transformation
(The Simpler-than-Ondex Way)
WHERE
{
?path a bp:Pathway;
bp:displayName ?pathName;
bp:pathwayComponent ?comp.
{
?comp a bp:BiochemicalReaction;
bp:left|bp:right ?protein.
}
UNION {
?react a bp:Complex;
bp:component ?protein.
}
?protein a bp:Protein;
bp:displayName ?protName.
BIND ( IRI ( CONCAT ( STR ( bkr: ), STR ( ?protName ) ) ) AS ?bkProt )
BIND ( IRI ( CONCAT ( STR ( ?bkProt ), "_acc" ) ) AS ?bkProtAccUri )
}
SPARQL/RDF for ELT
• TARQL: Using SPARQL to RDF-Convert Tabular CSV Files
• RDF/XML can be transformed via XSL
• We have done it for bio-specific ontology definitions in Ondex
• Programmatic conversions
• Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for
Python
• See also java2rdf (https://github.com/EBIBioSamples/java2rdf)
• We have used it for the Ondex->RDF converter
SPARQL/RDF for ELT
• TARQL: Using SPARQL to RDF-Convert Tabular CSV Files
• RDF/XML can be transformed via XSL
• We have done it for bio-specific ontology definitions in Ondex
• Programmatic conversions
• Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for
Python
• See also java2rdf (https://github.com/EBIBioSamples/java2rdf)
• We have used it for the Ondex->RDF converter
The Bigger Picture
The Bigger Picture
https://www.economist.com/node/21521548
The Bigger Picture
https://goo.gl/n4m5xL
Artificial	Intelligence	(AI)
8
https://www.economist.com/node/21521548
The Bigger Picture
https://goo.gl/n4m5xL
Artificial	Intelligence	(AI)
8
https://www.economist.com/node/21521548
The Bigger Picture: Linked Open Data
Artificial	Intelligence	(AI)
8
https://lod-cloud.net/
In the Life Sciences
Another Graph Database World
Another Graph Database World
The Cypher Query/DML Language
Proteins->Reactions->Pathways:

// chain of paths, node selection via property (exploits indices)

MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) -
[:part_of] -> (pway:Path{ title: ‘apoptosis’ })

// further conditions, not always so performant

WHERE prot.name =~ ‘(?i)^DNA.+’

// Usual projection and post-selection operators

RETURN prot.name, pway

// Relations can have properties

ORDER BY csby.pvalue

LIMIT 1000
Proteins->Reactions->Pathways:
// Single-path (or same-direction branching) easy to write

MATCH (prot:Protein) - [:produced_by|consumed_by] -> (:Reaction) 

- [:part_of*1..3] -> (pway:Path)

RETURN ID(prot), ID(pway) LIMIT 1000

// Very compact forms available, depending on the data

MATCH (prot:Protein) - (pway:Path) RETURN pway
Cypher as Semantic Motif Language
Cypher as Semantic Motif Language
The rdf2neo Tool
The rdf2neo Tool
The rdf2neo Tool
The rdf2neo Tool
SELECT ?iri
{
?label rdfs:subClassOf* bk:Concept.
?iri a ?label.
}
SELECT ?label
{
{
?iri a ?label.
?label rdfs:subClassOf* bk:Concept.
}
UNION {
# it's always instance of concept
BIND ( bk:Concept AS ?label )
BIND ( ?iri AS ?iri )
}
} SELECT ?name ?value
{
{
?iri ?name ?value.
VALUES ( ?name ) {
(dcterms:identifier)
(dcterms:description)
(rdfs:comment)
(bk:prefName)
(bk:altName)
}
}
UNION {
?iri ?name ?value.
?name rdfs:subPropertyOf* bk:attribute.
}
}
The rdf2neo Tool
https://github.com/Rothamsted/rdf2neo
How to Use it, Concretely?
Playground: The Neo4j Browser
How to Use it, Concretely?
Programmatically: The Neo4j Drivers (for Java in this case)
How to Use it, Concretely?
Programmatically: The Neo4j Drivers (for Java in this case)
AuthToken auth = AuthTokens.basic ( "neo4j", "test" );
try (
Driver neodb = GraphDatabase.driver ( "bolt://127.0.0.1:7687", auth );
Session session = neodb.session ();
)
{
String cypher =
"MATCH (prot:Protein{ prefName:'TOB1' }) - [r:published_in] -> (pub)n" +
"WHERE toFloat ( r.Score ) > 0.9n" +
"RETURN pub.PMID, pub.AbstractHeader, pub.YEARn" +
"ORDER BY pub.YEAR DESCn" +
"LIMIT 30";
Statement stmt = new Statement ( cypher );
StatementResult rs = session.run ( stmt );
rs.forEachRemaining ( rec -> {
String pmid = rec.get ( "pub.PMID" ).asString ();
String title = rec.get ( "pub.AbstractHeader" ).asString ();
String year = rec.get ( "pub.YEAR" ).asString ();
System.out.format (
"PMID: %s, Title: "%s", year: %sn",
pmid, title, year
);
});
}
Triple Stores vs Prop Graphs
Neo4j, Cypher DBs, Graph DBs Semantic Web/Triple Stores
Data xchg format
- No official one, just Cypher, 

Support for GraphML, RDF

+/- Focus on backing applications

+ Focus on data sharing standards

Data model
+ Relations with properties

- Metadata/schemas/ontologies management
- Relations cannot have properties (reification
required)

+ Metadata/schemas/ontologies as first citizen
and standardised OWL
Performance + complex graph traversals + Comparable in most cases
Query Language
+ Cypher is easier (eg, compact, implicit elems)?

- Expressivity issues (unions)

- No standard QL (but efforts in progress, eg,
OpenCypher)
- SPARQL is Harder? (URIs, namespaces,
verbosity)

+ SPARQL More expressive
Standardisation,
openness
+/- (TinkerPop is open, Neo4j isn’t)

+ Commercial support

+ More alive and up-to date (e.g., support for
Hadoop, nice Neo4j browser, easy installation)
+ Natively open, many open implementations

- Instability and many short-lived prototypes

- Advancements seems to be slowing down

+ Some nice open and commercial browser
(LODEStar,
Scalability,

big data
+/- Commercial support to clustering/clouds for
Neo4j

+ Open support in TinkerPop
+ Load Balancing/Cluster solutions, Commercial
Cloud support (eg GraphDB)

+ SPARQL Over TinkerPop (via SAIL inteface)
Supporting Web APIs via JSON
{
"type": "Protein",
"id": "TOB1",
"prefName": "TOB1 Human",
"participates_in":
{
"type": "Pathway",
"id": "id1",
"evidence": "IMPD",
"prefName": "Bone Morphogenic Protein (BMP) Signalling and Regulation"
},
"is_annotated_by": "GO_0030014"
}
• Designed to be compatible with browser, i.e., Javascript
• Language of choice for web APIs, web browser consuming, dynamic
web interfaces (i.e., AJAX)
• Conceptually similar to XML (trees, nested structures)
• Often used in a lightweight way, without much schema constraints
Supporting Web APIs via JSON
{
"type": "Protein",
"id": "TOB1",
"prefName": "TOB1 Human",
"participates_in":
{
"type": "Pathway",
"id": "id1",
"evidence": "IMPD",
"prefName": "Bone Morphogenic Protein (BMP) Signalling and Regulation"
},
"is_annotated_by": "GO_0030014"
}
• Designed to be compatible with browser, i.e., Javascript
• Language of choice for web APIs, web browser consuming, dynamic
web interfaces (i.e., AJAX)
• Conceptually similar to XML (trees, nested structures)
• Often used in a lightweight way, without much schema constraints
Bridging to RDF: JSON-LD
…
"@id": "bkr:TOB1",
"@type": "bk:Protein",
"prefName": "TOB1 Human",
"dcterms:identifier": "TOB1",
"is_annotated_by": "obo:GO_0030014",
"participates_in": {
"@id": "http://www.wikipathways.org/id1",
"@type": "bk:Pathway",
"evidence": "bkev:IMPD",
"prefName":

“Bone Morphogenic Protein (BMP) Signalling and Regulation"
}
}
{
"@context": {
"bk": "http://www.ondex.org/bioknet/terms/",
"bka": "http://www.ondex.org/bioknet/terms/attributes/",
"bkds": "http://www.ondex.org/bioknet/terms/dataSources/",
"bkev": "http://www.ondex.org/bioknet/terms/evidences/",
"bkr": "http://www.ondex.org/bioknet/resources/",
"dcterms": "http://purl.org/dc/terms/",
"obo": "http://purl.obolibrary.org/obo/",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"@vocab": "http://www.ondex.org/bioknet/terms/",
"dcterms:identifier": { "@type": "xsd:string" },
"evidence": { "@type": “@id" }
},
…
JSON Schemas Babylon (and Our Focus)
JSON Schemas Babylon (and Our Focus)
JSON Schemas Babylon (and Our Focus)
JSON Schemas Babylon (and Our Focus)
JSON Schemas Babylon (and Our Focus)
Take-Home Messages
• From small data integration farm to sharing with the rest of the world => FAIR Principles
• Semantic Web has pros and cons
• Still useful for data model and schema governance, identifiers, complex models (namely,
ontologies)
• Alternative data sharing approaches, PG in particular
• More alive area, can be simpler (blends into existing industrial software better)
• LOD/FAIR principles not addressed much
• Integrating the two is useful
• APIs are a useful alternative/complementary approach
• LOD/FAIR principles to be addressed as well
• In our radar:
• complete the work, publishing SPARQL, Neo4j access, APIs
• Integrating similar projects in the agrifood field (e.g. BrAPI, DFW)
• Contribute to standardisation efforts like Bioschemas
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowledge Graphs

More Related Content

PDF
Hypermedia In Practice - FamilySearch Developers Conference 2014
Ryan Heaton
 
PDF
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...
Hans Constandt
 
PPTX
The Semantic Web #10 - SPARQL
Myungjin Lee
 
PPTX
Inference on the Semantic Web
Myungjin Lee
 
PPTX
Querying the Web of Data
Rinke Hoekstra
 
PDF
Data Exploration with Elasticsearch
Aleksander Stensby
 
PPTX
2016 bioinformatics i_databases_wim_vancriekinge
Prof. Wim Van Criekinge
 
PPTX
Hagedorn 2013: Beyond Darwin Core - Stable Identifiers and then quickly beyon...
Gregor Hagedorn
 
Hypermedia In Practice - FamilySearch Developers Conference 2014
Ryan Heaton
 
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...
Hans Constandt
 
The Semantic Web #10 - SPARQL
Myungjin Lee
 
Inference on the Semantic Web
Myungjin Lee
 
Querying the Web of Data
Rinke Hoekstra
 
Data Exploration with Elasticsearch
Aleksander Stensby
 
2016 bioinformatics i_databases_wim_vancriekinge
Prof. Wim Van Criekinge
 
Hagedorn 2013: Beyond Darwin Core - Stable Identifiers and then quickly beyon...
Gregor Hagedorn
 

What's hot (20)

PPTX
2016 02 23_biological_databases_part1
Prof. Wim Van Criekinge
 
PDF
The Lonesome LOD Cloud
Ruben Verborgh
 
PDF
Distributed Query Processing for Federated RDF Data Management
OlafGoerlitz
 
PPT
BioIT Europe 2010 - BioCatalogue
BioCatalogue
 
PPT
Phpconf2008 Sphinx En
Murugan Krishnamoorthy
 
PPTX
RDFa Tutorial
Ivan Herman
 
PDF
XSPARQL CrEDIBLE workshop
nunoalexandrelopes
 
PDF
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Vladimir Alexiev, PhD, PMP
 
PDF
Bio ontologies and semantic technologies
Prof. Wim Van Criekinge
 
PDF
Elastic Relevance Presentation feb4 2020
Brian Nauheimer
 
PPT
Biopython
bosc
 
PPTX
The Semantic Web #4 - RDF (1)
Myungjin Lee
 
PDF
Bioinfomatics laboratory
Effat Jahan Tamanna
 
PDF
Tabular Data on the Web
Gregg Kellogg
 
PPTX
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info
Chunlei Wu
 
PDF
ElasticSearch: Найдется все... и быстро!
Alexander Byndyu
 
PPTX
2015 bioinformatics databases_wim_vancriekinge
Prof. Wim Van Criekinge
 
KEY
History and Background of the USEWOD Data Challenge
Knud Möller
 
PPTX
The Semantic Web - This time... its Personal
Mark Wilkinson
 
PDF
Linked Data Technology and Status
Myungjin Lee
 
2016 02 23_biological_databases_part1
Prof. Wim Van Criekinge
 
The Lonesome LOD Cloud
Ruben Verborgh
 
Distributed Query Processing for Federated RDF Data Management
OlafGoerlitz
 
BioIT Europe 2010 - BioCatalogue
BioCatalogue
 
Phpconf2008 Sphinx En
Murugan Krishnamoorthy
 
RDFa Tutorial
Ivan Herman
 
XSPARQL CrEDIBLE workshop
nunoalexandrelopes
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Vladimir Alexiev, PhD, PMP
 
Bio ontologies and semantic technologies
Prof. Wim Van Criekinge
 
Elastic Relevance Presentation feb4 2020
Brian Nauheimer
 
Biopython
bosc
 
The Semantic Web #4 - RDF (1)
Myungjin Lee
 
Bioinfomatics laboratory
Effat Jahan Tamanna
 
Tabular Data on the Web
Gregg Kellogg
 
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info
Chunlei Wu
 
ElasticSearch: Найдется все... и быстро!
Alexander Byndyu
 
2015 bioinformatics databases_wim_vancriekinge
Prof. Wim Van Criekinge
 
History and Background of the USEWOD Data Challenge
Knud Möller
 
The Semantic Web - This time... its Personal
Mark Wilkinson
 
Linked Data Technology and Status
Myungjin Lee
 
Ad

Similar to Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowledge Graphs (20)

PPTX
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Rothamsted Research, UK
 
PPTX
Knetminer Backend Training, Nov 2018
Rothamsted Research, UK
 
PDF
Grammar-Based 
Interactive Visualization of Genomics Data
sehilyi
 
PDF
GoTermsAnalysisWithR
Aureliano Bombarely
 
PDF
Reproducible Workflow with Cytoscape and Jupyter Notebook
Keiichiro Ono
 
PDF
Semantic Web Technologies in Health Care Analytics
Robert Piro
 
PDF
Semantic Web Technologies in Health Care Analytics
Robert Piro
 
PPT
Modware
bosc
 
ODP
Bio2RDF@BH2010
François Belleau
 
PDF
Bio it 2005_rdf_workshop05
Joanne Luciano
 
PDF
The Role of Metadata in Reproducible Computational Research
Jeremy Leipzig
 
ODP
2009 0807 Lod Gmod
Jun Zhao
 
PDF
Visualize open data with Plone - eea.daviz PLOG 2013
Antonio De Marinis
 
PPTX
Bioinformatica t2-databases
Prof. Wim Van Criekinge
 
PDF
Knowledge Sharing - aCCCeso
Kaitlin Thaney
 
PDF
Open-source from/in the enterprise: the RDKit
Greg Landrum
 
PDF
Functional manipulations of large data graphs 20160601
David Wood
 
PDF
Linked Data for improved organization of research data
Samuel Lampa
 
ODP
2011-03-29 London - drools
Geoffrey De Smet
 
PDF
Bill howe 2_databases
Mahammad Valiyev
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Rothamsted Research, UK
 
Knetminer Backend Training, Nov 2018
Rothamsted Research, UK
 
Grammar-Based 
Interactive Visualization of Genomics Data
sehilyi
 
GoTermsAnalysisWithR
Aureliano Bombarely
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Keiichiro Ono
 
Semantic Web Technologies in Health Care Analytics
Robert Piro
 
Semantic Web Technologies in Health Care Analytics
Robert Piro
 
Modware
bosc
 
Bio2RDF@BH2010
François Belleau
 
Bio it 2005_rdf_workshop05
Joanne Luciano
 
The Role of Metadata in Reproducible Computational Research
Jeremy Leipzig
 
2009 0807 Lod Gmod
Jun Zhao
 
Visualize open data with Plone - eea.daviz PLOG 2013
Antonio De Marinis
 
Bioinformatica t2-databases
Prof. Wim Van Criekinge
 
Knowledge Sharing - aCCCeso
Kaitlin Thaney
 
Open-source from/in the enterprise: the RDKit
Greg Landrum
 
Functional manipulations of large data graphs 20160601
David Wood
 
Linked Data for improved organization of research data
Samuel Lampa
 
2011-03-29 London - drools
Geoffrey De Smet
 
Bill howe 2_databases
Mahammad Valiyev
 
Ad

More from Rothamsted Research, UK (20)

PPTX
FAIR Agronomy, where are we? The KnetMiner Use Case
Rothamsted Research, UK
 
PPTX
Interoperable Data for KnetMiner and DFW Use Cases
Rothamsted Research, UK
 
PPTX
AgriSchemas: Sharing Agrifood data with Bioschemas
Rothamsted Research, UK
 
PPTX
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Rothamsted Research, UK
 
PPTX
Continuos Integration @Knetminer
Rothamsted Research, UK
 
PDF
Better Data for a Better World
Rothamsted Research, UK
 
PPTX
AgriSchemas Progress Report
Rothamsted Research, UK
 
PPTX
AgriFood Data, Models, Standards, Tools, Use Cases
Rothamsted Research, UK
 
PDF
Notes about SWAT4LS 2018
Rothamsted Research, UK
 
PPTX
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
Rothamsted Research, UK
 
PDF
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Rothamsted Research, UK
 
ODP
graph2tab, a library to convert experimental workflow graphs into tabular for...
Rothamsted Research, UK
 
PDF
Interoperable Open Data: Which Recipes?
Rothamsted Research, UK
 
PDF
Linked Data with the EBI RDF Platform
Rothamsted Research, UK
 
PDF
BioSD Linked Data: Lessons Learned
Rothamsted Research, UK
 
PDF
BioSD Tutorial 2014 Editition
Rothamsted Research, UK
 
PDF
myEquivalents, aka a new cross-reference service
Rothamsted Research, UK
 
PDF
Dev 2014 LOD tutorial
Rothamsted Research, UK
 
PDF
BioSamples Database Linked Data, SWAT4LS Tutorial
Rothamsted Research, UK
 
FAIR Agronomy, where are we? The KnetMiner Use Case
Rothamsted Research, UK
 
Interoperable Data for KnetMiner and DFW Use Cases
Rothamsted Research, UK
 
AgriSchemas: Sharing Agrifood data with Bioschemas
Rothamsted Research, UK
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Rothamsted Research, UK
 
Continuos Integration @Knetminer
Rothamsted Research, UK
 
Better Data for a Better World
Rothamsted Research, UK
 
AgriSchemas Progress Report
Rothamsted Research, UK
 
AgriFood Data, Models, Standards, Tools, Use Cases
Rothamsted Research, UK
 
Notes about SWAT4LS 2018
Rothamsted Research, UK
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
Rothamsted Research, UK
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Rothamsted Research, UK
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
Rothamsted Research, UK
 
Interoperable Open Data: Which Recipes?
Rothamsted Research, UK
 
Linked Data with the EBI RDF Platform
Rothamsted Research, UK
 
BioSD Linked Data: Lessons Learned
Rothamsted Research, UK
 
BioSD Tutorial 2014 Editition
Rothamsted Research, UK
 
myEquivalents, aka a new cross-reference service
Rothamsted Research, UK
 
Dev 2014 LOD tutorial
Rothamsted Research, UK
 
BioSamples Database Linked Data, SWAT4LS Tutorial
Rothamsted Research, UK
 

Recently uploaded (20)

PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PDF
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
Presentation on animal welfare a good topic
kidscream385
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 

Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowledge Graphs

  • 1. Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowledge Graphs Harpenden, 3/6/2018
 Marco Brandizi <[email protected]> Find these slides on SlideShare KnetMiner-inspired Artwork
 by Hugo Dalton (hugodalton.com)
  • 2. Behind the scenes of KnetMiner
  • 3. Putting it on a Bigger Picture
  • 4. Putting it on a Bigger Picture
  • 5. <concept> <id>1</id> <pid>Q75WV3</pid> <description/> <elementOf> <idRef>UNIPROTKB-SwissProt</idRef> </elementOf> <ofType> <idRef>Protein</idRef> </ofType> <evidences> <evidence> <idRef>IMPD</idRef> </evidence> </evidences> <conames> <concept_name> <name>Probable trehalose-phosphate phosphatase 1</name> <isPreferred>true</isPreferred> </concept_name> … <cc> <id>Protein</id> <fullname>Protein</fullname> <description> A protein is comprised of one or more Polypeptides and potentially other molecules. </description> <specialisationOf> <idRef>MolCmplx</idRef> </specialisationOf> </cc> <relation> <fromConcept>1</fromConcept> <toConcept>3</toConcept> <ofType> <idRef>participates_in</idRef> </ofType> <evidences> <evidence> <idRef>ECO:0000316</idRef> </evidence> </evidences> <relgds/> </relation> <concept> <id>3</id> <pid>GO:0009651</pid> <description>response to salt stress</description> <ofType><idRef>BioProc</idRef></ofType> <coaccessions> <concept_accession> <accession>GO:0009651</accession> <elementOf><idRef>GO</idRef></elementOf> <ambiguous>false</ambiguous> </concept_accession> </coaccessions> </concept> Is XML/OXL Enough?
  • 6. A Brief History of Data Models/Formats
  • 7. The Semantic Web Approach: RDF
  • 8. The Semantic Web Approach: RDF
  • 9. URI Resolution @prefix bkr: <http://www.ondex.org/bioknet/resources/> . @prefix bk: <http://www.ondex.org/bioknet/terms/> . @prefix bka: <http://www.ondex.org/bioknet/terms/attributes/> . bkr:TOB1 a bk:Protein ; bk:participates_in <http://www.wikipathways.org/id1> ; bk:prefName "TOB1"; bk:published_in bkr:23236473.
 The Turtle Syntax: https://www.w3.org/TR/turtle/
  • 13. Sharing Identifiers via URIs Data store Schema store Wikipathways
  • 14. Mapping Data for Interoperability
  • 17. Our Data Model: The BioKNO Ontology
  • 18. wp:id1 a bk:Path ; # a subclass of bk:Concept bk:evidence bkev:IMPD ; # Imported from database, a predefined resource type. bk:prefName "Bone Morphogenic Protein (BMP) Signalling and Regulation". bkr:TOB1 a bk:Protein ; dc:identifier bkr:TOB1_acc ; bk:prefName "TOB1 HUMAN";
 # A simplified link, hiding the BioPax chain: # pathwayComponent -> BioChemicalReaction|Complex -> Protein bk:participates_in wp:id1; 
 bk:is_annotated_by obo:GO_0030014. # Same URI as the OBO Gene Ontology Term. # Structured accession, allow for linking of identifier and context. bkr:TOB1_acc a bk:Accession ; dcterms:identifier "TOB1"; # instance of bk:DataSource. Another predefined entity. bk:dataSource bkds:UNIPROTKB. BioKNO: Biological Entities
  • 19. # For practical reasons, we always expect that the straight # triple is always asserted, with the # reified version optionally added to it. bkr:TOB1 bk:published_in bkr:20068231. bkr:citation_TOB1_15489334 a bk:Relation ; # the same properties that are used for regular relations bk:relTypeRef bk:published_in; bk:relFrom bkr:TOB1 ; bk:relTo bkr:15489334 ; # An attribute bka:score 0.95 ;
 # Both attributes and object properties can be linked to a # reified relation. bk:evidence bkev:TextMining. Attributes in Reified Relations
  • 20. Talking to the Rest of The World BioKNO External Ontologies Mapping Type bk:Concept skos:Concept Subclass bk:Relation bk:relFrom bk:relTypeRef bk:relTo rdf:Statement
 rdf:subject rdf:predicate rdf:object Subclass Subproperties (ie, mapping to RDF reified statements) bk:Path, bk:Participant, bk:Interaction, bk:Transport, bk:Protein, bk:Gene Classes with same names in BioPAX and SIO Equivalent Class bk:participates_in bk:has_participant Relation Ontology (RO) properties with same names
 biopax:participant (as sub-property) Equivalent property bk:produces bk:produced_by bk:consumes bk:consumed_by biopax:product (as sub-property) RO properties with same names Equivalent property bk:regulates bk:positively_regulates bk:negatively_regulates RO properties with same names Equivalent property bk:is_a bk:part_of, bk:has_part bk:occurs_in, bk:co_occurs_with skos:broader Basic Formal Ontology (BFO)/RO properties with same names Equivalent property bk:Publication schema:CreativeWork Subclass bka:abstract bka:title (also known as AbstractHeader) bka:authors dcterms:description dcterms:title dc:creator Sub-property
  • 22. How to Serve and Query RDF?
  • 23. Typical RDF (and Data) Architecture
  • 24. How to Use it, Concretely? Playground: SPARQL Browsers
  • 25. How to Use it, Concretely? Playground: SPARQL Browsers
  • 26. How to Use it, Concretely? Playground: SPARQL Browsers
  • 27. How to Use it, Concretely? Programmatically: RDF Frameworks (Jena in this case)
  • 28. How to Use it, Concretely? Programmatically: RDF Frameworks (Jena in this case)
  • 29. How to Use it, Concretely? Programmatically: RDF Frameworks (Jena in this case) String service = "http://localhost:3030/ds/query"; String sparql = "PREFIX bk: <http://www.ondex.org/bioknet/terms/>n" + 
 … "n" + "n" + "SELECT DISTINCT ?pmid ?title ?year ?pub n" + "{n" + " ?prot a bk:Protein;n" + " bk:prefName 'TOB1'.n" + " n" + " ?pubRel a bk:Relation;n" + " bk:relFrom ?prot;n" + " bk:relTo ?pub;n" + " bka:Score ?score.n" + " n" + " FILTER ( ?score > 0.90 )n" + " n" + " ?pub n" + " bka:PMID ?pmid ;n" + " bka:YEAR ?dyear;n" + " bka:abstractHeader ?titlen" + "n" + " BIND ( xsd:int ( ?dyear ) AS ?year )n" + "}n" + "LIMIT 1000";
  • 30. How to Use it, Concretely? Programmatically: RDF Frameworks (Jena in this case) String service = "http://localhost:3030/ds/query"; String sparql = "PREFIX bk: <http://www.ondex.org/bioknet/terms/>n" + 
 … "n" + "n" + "SELECT DISTINCT ?pmid ?title ?year ?pub n" + "{n" + " ?prot a bk:Protein;n" + " bk:prefName 'TOB1'.n" + " n" + " ?pubRel a bk:Relation;n" + " bk:relFrom ?prot;n" + " bk:relTo ?pub;n" + " bka:Score ?score.n" + " n" + " FILTER ( ?score > 0.90 )n" + " n" + " ?pub n" + " bka:PMID ?pmid ;n" + " bka:YEAR ?dyear;n" + " bka:abstractHeader ?titlen" + "n" + " BIND ( xsd:int ( ?dyear ) AS ?year )n" + "}n" + "LIMIT 1000"; Query query = QueryFactory.create ( sparql ); QueryEngineHTTP qexec = QueryExecutionFactory.createServiceRequest( service, query ); ResultSet results = qexec.execSelect() ; results.forEachRemaining ( (QuerySolution soln ) -> { Resource pubNode = soln.getResource ( "pub" ); String uri = pubNode.getURI (); Literal titleNode = soln.getLiteral ( "title" ); String title = titleNode.getString (); String titleLang = titleNode.getLanguage (); Literal yearNode = soln.getLiteral ( "year" ); int year = yearNode.getInt (); System.out.format ( "Publication ID: <%s>, title: %s (in %s), year: %dn", uri, title, titleLang, year ); });
  • 31. CONSTRUCT { ?path a bk:Path; bk:prefName ?pathName; bk:evidence bkev:IMPD. ?bkProt a bk:Protein; dc:identifier ?bkProtAccUri; bk:prefName ?protName; bk:participates_in ?path. ?bkProtAccUri a bk:Accession; dcterms:identifier ?protName; bk:dataSource bkds:UNIPROTKB. } SPARQL for Extraction, Loading, Transformation (The Simpler-than-Ondex Way) WHERE { ?path a bp:Pathway; bp:displayName ?pathName; bp:pathwayComponent ?comp. { ?comp a bp:BiochemicalReaction; bp:left|bp:right ?protein. } UNION { ?react a bp:Complex; bp:component ?protein. } ?protein a bp:Protein; bp:displayName ?protName. BIND ( IRI ( CONCAT ( STR ( bkr: ), STR ( ?protName ) ) ) AS ?bkProt ) BIND ( IRI ( CONCAT ( STR ( ?bkProt ), "_acc" ) ) AS ?bkProtAccUri ) }
  • 32. CONSTRUCT { ?path a bk:Path; bk:prefName ?pathName; bk:evidence bkev:IMPD. ?bkProt a bk:Protein; dc:identifier ?bkProtAccUri; bk:prefName ?protName; bk:participates_in ?path. ?bkProtAccUri a bk:Accession; dcterms:identifier ?protName; bk:dataSource bkds:UNIPROTKB. } SPARQL for Extraction, Loading, Transformation (The Simpler-than-Ondex Way) WHERE { ?path a bp:Pathway; bp:displayName ?pathName; bp:pathwayComponent ?comp. { ?comp a bp:BiochemicalReaction; bp:left|bp:right ?protein. } UNION { ?react a bp:Complex; bp:component ?protein. } ?protein a bp:Protein; bp:displayName ?protName. BIND ( IRI ( CONCAT ( STR ( bkr: ), STR ( ?protName ) ) ) AS ?bkProt ) BIND ( IRI ( CONCAT ( STR ( ?bkProt ), "_acc" ) ) AS ?bkProtAccUri ) }
  • 33. SPARQL/RDF for ELT • TARQL: Using SPARQL to RDF-Convert Tabular CSV Files • RDF/XML can be transformed via XSL • We have done it for bio-specific ontology definitions in Ondex • Programmatic conversions • Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for Python • See also java2rdf (https://github.com/EBIBioSamples/java2rdf) • We have used it for the Ondex->RDF converter
  • 34. SPARQL/RDF for ELT • TARQL: Using SPARQL to RDF-Convert Tabular CSV Files • RDF/XML can be transformed via XSL • We have done it for bio-specific ontology definitions in Ondex • Programmatic conversions • Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for Python • See also java2rdf (https://github.com/EBIBioSamples/java2rdf) • We have used it for the Ondex->RDF converter
  • 39. The Bigger Picture: Linked Open Data Artificial Intelligence (AI) 8 https://lod-cloud.net/
  • 40. In the Life Sciences
  • 43. The Cypher Query/DML Language Proteins->Reactions->Pathways:
 // chain of paths, node selection via property (exploits indices)
 MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) - [:part_of] -> (pway:Path{ title: ‘apoptosis’ })
 // further conditions, not always so performant
 WHERE prot.name =~ ‘(?i)^DNA.+’
 // Usual projection and post-selection operators
 RETURN prot.name, pway
 // Relations can have properties
 ORDER BY csby.pvalue
 LIMIT 1000 Proteins->Reactions->Pathways: // Single-path (or same-direction branching) easy to write
 MATCH (prot:Protein) - [:produced_by|consumed_by] -> (:Reaction) 
 - [:part_of*1..3] -> (pway:Path)
 RETURN ID(prot), ID(pway) LIMIT 1000
 // Very compact forms available, depending on the data
 MATCH (prot:Protein) - (pway:Path) RETURN pway
  • 44. Cypher as Semantic Motif Language
  • 45. Cypher as Semantic Motif Language
  • 49. The rdf2neo Tool SELECT ?iri { ?label rdfs:subClassOf* bk:Concept. ?iri a ?label. } SELECT ?label { { ?iri a ?label. ?label rdfs:subClassOf* bk:Concept. } UNION { # it's always instance of concept BIND ( bk:Concept AS ?label ) BIND ( ?iri AS ?iri ) } } SELECT ?name ?value { { ?iri ?name ?value. VALUES ( ?name ) { (dcterms:identifier) (dcterms:description) (rdfs:comment) (bk:prefName) (bk:altName) } } UNION { ?iri ?name ?value. ?name rdfs:subPropertyOf* bk:attribute. } }
  • 51. How to Use it, Concretely? Playground: The Neo4j Browser
  • 52. How to Use it, Concretely? Programmatically: The Neo4j Drivers (for Java in this case)
  • 53. How to Use it, Concretely? Programmatically: The Neo4j Drivers (for Java in this case) AuthToken auth = AuthTokens.basic ( "neo4j", "test" ); try ( Driver neodb = GraphDatabase.driver ( "bolt://127.0.0.1:7687", auth ); Session session = neodb.session (); ) { String cypher = "MATCH (prot:Protein{ prefName:'TOB1' }) - [r:published_in] -> (pub)n" + "WHERE toFloat ( r.Score ) > 0.9n" + "RETURN pub.PMID, pub.AbstractHeader, pub.YEARn" + "ORDER BY pub.YEAR DESCn" + "LIMIT 30"; Statement stmt = new Statement ( cypher ); StatementResult rs = session.run ( stmt ); rs.forEachRemaining ( rec -> { String pmid = rec.get ( "pub.PMID" ).asString (); String title = rec.get ( "pub.AbstractHeader" ).asString (); String year = rec.get ( "pub.YEAR" ).asString (); System.out.format ( "PMID: %s, Title: "%s", year: %sn", pmid, title, year ); }); }
  • 54. Triple Stores vs Prop Graphs Neo4j, Cypher DBs, Graph DBs Semantic Web/Triple Stores Data xchg format - No official one, just Cypher, 
 Support for GraphML, RDF
 +/- Focus on backing applications + Focus on data sharing standards Data model + Relations with properties - Metadata/schemas/ontologies management - Relations cannot have properties (reification required) + Metadata/schemas/ontologies as first citizen and standardised OWL Performance + complex graph traversals + Comparable in most cases Query Language + Cypher is easier (eg, compact, implicit elems)?
 - Expressivity issues (unions) - No standard QL (but efforts in progress, eg, OpenCypher) - SPARQL is Harder? (URIs, namespaces, verbosity)
 + SPARQL More expressive Standardisation, openness +/- (TinkerPop is open, Neo4j isn’t) + Commercial support + More alive and up-to date (e.g., support for Hadoop, nice Neo4j browser, easy installation) + Natively open, many open implementations - Instability and many short-lived prototypes - Advancements seems to be slowing down + Some nice open and commercial browser (LODEStar, Scalability,
 big data +/- Commercial support to clustering/clouds for Neo4j
 + Open support in TinkerPop + Load Balancing/Cluster solutions, Commercial Cloud support (eg GraphDB)
 + SPARQL Over TinkerPop (via SAIL inteface)
  • 55. Supporting Web APIs via JSON { "type": "Protein", "id": "TOB1", "prefName": "TOB1 Human", "participates_in": { "type": "Pathway", "id": "id1", "evidence": "IMPD", "prefName": "Bone Morphogenic Protein (BMP) Signalling and Regulation" }, "is_annotated_by": "GO_0030014" } • Designed to be compatible with browser, i.e., Javascript • Language of choice for web APIs, web browser consuming, dynamic web interfaces (i.e., AJAX) • Conceptually similar to XML (trees, nested structures) • Often used in a lightweight way, without much schema constraints
  • 56. Supporting Web APIs via JSON { "type": "Protein", "id": "TOB1", "prefName": "TOB1 Human", "participates_in": { "type": "Pathway", "id": "id1", "evidence": "IMPD", "prefName": "Bone Morphogenic Protein (BMP) Signalling and Regulation" }, "is_annotated_by": "GO_0030014" } • Designed to be compatible with browser, i.e., Javascript • Language of choice for web APIs, web browser consuming, dynamic web interfaces (i.e., AJAX) • Conceptually similar to XML (trees, nested structures) • Often used in a lightweight way, without much schema constraints
  • 57. Bridging to RDF: JSON-LD … "@id": "bkr:TOB1", "@type": "bk:Protein", "prefName": "TOB1 Human", "dcterms:identifier": "TOB1", "is_annotated_by": "obo:GO_0030014", "participates_in": { "@id": "http://www.wikipathways.org/id1", "@type": "bk:Pathway", "evidence": "bkev:IMPD", "prefName":
 “Bone Morphogenic Protein (BMP) Signalling and Regulation" } } { "@context": { "bk": "http://www.ondex.org/bioknet/terms/", "bka": "http://www.ondex.org/bioknet/terms/attributes/", "bkds": "http://www.ondex.org/bioknet/terms/dataSources/", "bkev": "http://www.ondex.org/bioknet/terms/evidences/", "bkr": "http://www.ondex.org/bioknet/resources/", "dcterms": "http://purl.org/dc/terms/", "obo": "http://purl.obolibrary.org/obo/", "xsd": "http://www.w3.org/2001/XMLSchema#", "@vocab": "http://www.ondex.org/bioknet/terms/", "dcterms:identifier": { "@type": "xsd:string" }, "evidence": { "@type": “@id" } }, …
  • 58. JSON Schemas Babylon (and Our Focus)
  • 59. JSON Schemas Babylon (and Our Focus)
  • 60. JSON Schemas Babylon (and Our Focus)
  • 61. JSON Schemas Babylon (and Our Focus)
  • 62. JSON Schemas Babylon (and Our Focus)
  • 63. Take-Home Messages • From small data integration farm to sharing with the rest of the world => FAIR Principles • Semantic Web has pros and cons • Still useful for data model and schema governance, identifiers, complex models (namely, ontologies) • Alternative data sharing approaches, PG in particular • More alive area, can be simpler (blends into existing industrial software better) • LOD/FAIR principles not addressed much • Integrating the two is useful • APIs are a useful alternative/complementary approach • LOD/FAIR principles to be addressed as well • In our radar: • complete the work, publishing SPARQL, Neo4j access, APIs • Integrating similar projects in the agrifood field (e.g. BrAPI, DFW) • Contribute to standardisation efforts like Bioschemas