Bio2RDF presentation at Combine 2012

Bio2RDF's namespace
SPARQL endpoint

François Belleau
Centre de Biologie Computationnelle
du CRCHUQ

You know them?

How can we help to navigate in the
huge Bioinformatics databases cloud ?

2005 BioPAX.gif next to Semantic Web
image vision of Tim Berner Lee

Databases of databases names
● PathGuide
● Bioformatics.ca Links Directory
● Annual NAR Database issue
● Go, Uniprot, Genbank cross-reference list
● LSRN initiative
● MIRIAM EBI project
● BioPAX dataprovider community
● Bio2RDF Linked Data space

Two interesting questions
● Which namespace are the most popular for
identifying database ?

● How far is the BioPAX community to adopt
MIRIAM new namespace standard ?

Which namespaces are the most
popular to identify a database ?

Namespaces collection used by the BioPAX
data provider community.

How far is the BioPAX community to adopt
MIRIAM new namespace standard ?

How we did this ?

To answer a complex question
we first need to build the
database that will potentially
answer it: a semantic mashup.

A mashup, in web development, is a web page,
or web application, that uses and combines
data, presentation or functionality from two or
more sources to create new services. The term
implies easy, fast integration, frequently using
open Application programming interfaces (API)
and data sources to produce enriched results
that were not necessarily the original reason for
producing the raw source data.

http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid)

Building a mashup is a lot easier when
using Semantic Web technologies like
RDF and SPARQL design for data
interoperability.

A three steps method
● Get the data form the data provider and
transform it into RDF, we use Talend open
source Eclipse base ETL software.
● Load the data in a triplestore many software are
available (Virtuoso, Sesame, Jena, store,
Mulgara, etc.) to load your mashup
● Explore the new dataset using specialised user
interface (RelFinder, Virtuoso facet browser)
● Design your SPARQL query and get the answer

Data provider xref resource
● http://www.ebi.ac.uk/miriam/main/ (XML format)
● Bio2RDF DNS zone description file (text)
● http://lsrn.org/ (RDF/XML)
● http://www.geneontology.org/doc/GO.xrf_abbs
(key/value format)
● http://www.uniprot.org/docs/dbxref (key/value format)
● http://www.ncbi.nlm.nih.gov/genbank/collab/db_xref/
(HTML)
● 12 BioPAX providers (Reactome, Biomodels, Biocyc,
Panther, INOH, etc)

Lesson #1

Produce RDF triples with a
profesionnal ETL tool.

Talend ETL opensource free
software

http://www.talend.com/index.php

Talend workflows to convert HTML Genbank
page to triples and MIRAM XML

Lesson #2

Publish with a SPARQL endpoint.
(to get a free 5 stars cup)

Load RDF triples into a triplestore
(we use Openlink Virtuoso)

http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/

Full text search

http://namespace.bio2rdf.org/fct/

Discover entity name
and browse the triplestore

Lesson #3

Consume as you like.
HTTP GET to obtain RDF from URI,
SPARQL endpoint,
SOAP services returning RDF,
semantic web new software...

The needed SPARQL query
to draw the previous graph
using ManyEyes service

http://namespace.bio2rdf.org/sparql

Use a SOAP service

http://namespace.bio2rdf.org/bio2rdf/services.wsdl

Discover your relations
graphicly with RelFinder

http://www.visualdataweb.org/relfinder.php

Conclusion

● Building a mashup is easy with the actual
software, we still need the RDF data.
● One SPARQL query in the proper triplestore
(Bio2RDF's namespace mashup) could answer
our two initial questions.
● Why not consider publish your own SPARQL
endpoint to make semantic hacker's life
easier ?

Acknowledgements

● Bio2RDF is a community project available at http://bio2rdf.org
● The community can be joined at
https://groups.google.com/forum/?fromgroups#!forum/bio2rdf
● This work was done under the supervision of Dr Arnaud Droit, assistant
professor and director of the Centre de Biologie Computationnelle du
CRCHUQ at Laval University, where a mirror of Bio2RDF is hosted.
● Michel Dumontier, from the Dumontier Lab at Carleton University, is also
hosting Bio2RDF server and actually leads the project
● Thanks to all the people member of the Bio2RDF community, and especially
Marc-Alexandre Nolin and Peter Ansell, initial developers.

Come in Montreal July 2013 with your
SPARQL endpoint an get a FREE cup!

http://www.unbsj.ca/sase/csas/data/semantic-trilogy-2013/

Bio2RDF presentation at Combine 2012

More Related Content

What's hot (19)

Similar to Bio2RDF presentation at Combine 2012 (20)

More from François Belleau (17)

Recently uploaded (20)

Bio2RDF presentation at Combine 2012