SlideShare a Scribd company logo
BioSamples Database Linked Data 
(2014 edition) 
Marco Brandizi, 
Functional Genomics Team 
SWAT4LS Tutorial, Dec 9th, 2014 
Find this presentation on 
http://www.slideshare.net/mbrandizi
Why a BioSamples Database (aka BioSD)? 
• A reference system, where to search/browse information about biological 
samples, used/useable for biomedical experiments 
• Focused on the sample context (i.e., independent on the specific assay 
type/technology) 
• Supports heterogeneous experiments 
– Single place that assay repositories can link (reference samples, 
authoritative source for repositories like 
Metagenomics/ENA/ArrayExpress) 
– Single place for searches and related-to or same-as relationships 
(e.g., see the 'myEquivalents' project) 
• Common interfaces to access sample information and links to specific 
data/repositories (e.g., web, XML/REST, RDF)
Why Linked Data for BioSD? 
• Potentially useful to application developers and Linked Data tools 
• Integration with similar/related data-sets 
• Exploitation of ontologies 
– Standardisation 
– A little semantics goes a long way 
– Improved searching 
• As usually, open to unexpected uses 
– e.g., http://www.phyloviz.net/NGSonto
The BioSD Model 
Sample Groups 
Submission 
External links 
Samples 
http://www.ebi.ac.uk/biosamples
The BioSD Model 
Group's (or Submission's) samples 
Sample's (or Groups') attribute types 
and values 
External links
Changes to Linked Data Model 
• • Main Main Entities: Entities: http://http://tinyurl.tinyurl.com/com/lo33ncc 
lo33ncc 
• • Details Details about about Sample Sample Attributes: Attributes: http://http://tinyurl.tinyurl.com/com/n5oyvyd 
n5oyvyd 
Several improvements to the conversion software, Several improvements to the conversion software, A Aimiminingg aatt mmoorree ffrreeqquueenntt aauuttoo--uuppddaatteess
SPARQL Queries
Find Samples and attributes 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> 
PREFIX sio: <http://semanticscience.org/resource/> 
SELECT DISTINCT ?smp ?pvLabel ?propTypeLabel 
WHERE 
{ 
?smp 
a biosd-terms:Sample; 
biosd-terms:has-bio-characteristic | sio:SIO_000332 ?pv. # is about 
?pv 
rdfs:label ?pvLabel; 
biosd-terms:has-bio-characteristic-type ?pvType. 
?pvType 
rdfs:label ?propTypeLabel. 
} 
• Exercise: use FILTER()/REGEX() to find organism=homo sapiens 
• Exercise: Find sample' repositories of provenance and their links 
– Hint: explore the sample's links (?smp) and see how RepositoryWebRecord 
looks like 
Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql 
Excercise Solution: see examples on such page
Samples about a given organism 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> 
SELECT DISTINCT ?smp ?pvLabel ?propTypeLabel 
WHERE { 
?smp biosd-terms:has-bio-characteristic ?pv. 
?pv biosd-terms:has-bio-characteristic-type ?pvType; 
rdfs:label ?pvLabel. 
?pvType a ?pvTypeClass. 
# Listeria 
?pvTypeClass 
rdfs:label ?propTypeLabel; 
# '*' gives you transitive closure, even when inference is disabled 
rdfs:subClassOf* <http://purl.obolibrary.org/obo/NCBITaxon_1637> 
} 
• Exercise: Use the Bioportal Service to first find all subclasses of 'alcohol' (obo:CHEBI_30879) 
and then search samples annotated with such subclasses 
– Hint: Use SERVICE <http://sparql.bioontology.org/ontologies/sparql/?apikey=KEY> 
Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql 
Excercise Solution: see one of the examples on such page
Geo-located Samples/Sample Groups 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> 
PREFIX sio: <http://semanticscience.org/resource/> 
SELECT DISTINCT ?item ?latVal ?longVal WHERE { 
?item biosd-terms:has-bio-characteristic ?latPv, ?longPv. 
?latPv 
biosd-terms:has-bio-characteristic-type [ rdfs:label ?latLabel]; 
sio:SIO_000300 ?latVal. # sio:has value 
FILTER ( REGEX ( ?latLabel, "latitude", "i" ) ). 
?longPv 
biosd-terms:has-bio-characteristic-type [ rdfs:label ?longLabel ]; 
sio:SIO_000300 ?longVal. # sio:has value 
FILTER ( REGEX ( ?longLabel, "longitude", "i" ) ). 
} 
• Find all samples having an attribute of type temperature, with a numerical value and a unit 
specified. Hint: use sio:SIO_000221 (has unit), sio:SIO_000300 (has value) 
• Find samples/groups annotated with intervals, which use the properties biosd-terms:has-low-value 
and has-high-value and optionally have a unit. 
Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql 
Excercise Solutions: see examples on that page
Expressed Genes and Samples 
• For http://purl.uniprot.org/uniprot/P04637 (P53 in Human) 
• Find the EFO classes for which it is up-regulated in the Atlas (p-value < 1E-9) 
• And show the Atlas expression value label . Hints: 
– Start from the example http://tinyurl.com/kvvhw6b, 
– Use the Atlas endpoint: http://www.ebi.ac.uk/rdf/services/atlas/sparql 
• Find the samples having attributes that are instances of such EFO classes 
• Which comes from a repository other than 'ArrayExpress' 
• Hints: 
– Use SERVICE <http://www.ebi.ac.uk/rdf/services/biosamples/sparql> and a sub-query 
– Search property values linked to prop. types that are instances of the e.f. found by the 
Atlas 
– Then link to the samples, the samples to the submissions, the submissions to the web 
records 
● OR JUST HAVE A LOOK: http://goo.gl/kOfE1r (will take a while...)
New Ideas and Alike
Geo-Samples, Google Map Integration 
• Exercise: From geo-located samples to Google Map. Think how to do it: 
● Gmaps supports the KML format (https://developers.google.com/kml) 
● You can type a KML-returning URL into maps.google.com 
(or pass it via GET, q=<kml-url>) 
● The SPARQL endpoint can return results in XML format 
● There are on line XSLTs: http://services.w3.org/xslt?xslfile=<url>&xmlfile=<url> 
http://tinyurl.com/kzd2pg4 
http://tinyurl.com/lf2623l 
http://tinyurl.com/lltqy2u 
http://goo.gl/maps/CMRrk 
Many thanks to Costanza Romano
Search-by-Feature Similarity (ongoing) 
SELECT DISTINCT ?smp ? smpDescr (COUNT (DISTINCT ?pv) AS ?score) 
WHERE { 
{ 
?smp a biosd-terms:Sample; 
rdfs:comment ?smpDescr. 
?smp biosd-terms:has-bio-characteristic ?pv. 
?pv biosd-terms:has-bio-characteristic-type ?pvType. 
?pvType a <http://purl.obolibrary.org/obo/NCBITaxon_10090>. 
} UNION { 
?smp a biosd-terms:Sample; 
rdfs:comment ?smpDescr. 
... 
?pvType a <http://purl.obolibrary.org/obo/NCBITaxon_10090>. 
} UNION ... 
} 
GROUP BY ?smp ?smpDescr 
HAVING (COUNT (DISTINCT ?pv) > 0) 
ORDER BY DESC (COUNT (DISTINCT ?pv)) 
• Many thanks to AbdulShakur Abdullah, Eric Hillaert, Prasad Nuli 
(https://github.com/CapStoneEBI2014/biosd_similarity_search)
More (possibly for the hackathon) 
• Continuing with the similarity search 
• Improving linkage with other data sets 
– e.g., targeting samples in ArrayExpress/Atlas 
– e.g., links to EPMC data sets (PMID->PMC conversion), Bio2RDF 
publications, LLD publications 
• Aiming at supporting similar datasets 
– Interested in the on-going HCLS' work about HL7->RDF 
– Collaborating with the European biobank community 
● Interested in the BBMRI-ontology (http://tinyurl.com/qjttyge) 
• Visualisations/widgets 
– Geo-located samples on a map 
– Samples on body map 
– Using the BioJS library
Acknowledgements 
• BioSD Team - Alvis Brazma, Tony Burdett, Adam 
Faulconbridge, Mike Gostev, Helen Parkinson, Rui Perreria, 
Ugis Sarkans, Drashtti Vasant 
• Tony Burdett for the help with Zooma 
• Simon Jupp, Andy Jenkinson, James Malone, for their great 
help with developing and setting up BioSD/RDF 
– The rest of the Linked Data team @EBI 
(http://www.ebi.ac.uk/rdf) 
• BiomedBridges FP7 project (http://www.biomedbridges.eu), for 
funding us
And you all! 
Contact info: 
www.ebi.ac.uk/biosamples 
www.marcobrandizi.info 
Sorry, we have grown to ~4M samples, yet we don't have all of them, 
not even this year... 
(Sources: http://en.wikipedia.org/wiki/File:Assorted_computer_mice_-_MfK_Bern.jpg, 
http://tinyurl.com/otfnhk6, http://tinyurl.com/odkadvn, http://tinyurl.com/pyrqrdf)
Extras
BioSD Data (External Data Sources) 
SPARQL Source: http://tinyurl.com/o95xa5v 
Tag Cloud made with http://www.wordle.net 
(2013) 
submissions sampleGroups samples 
126490 126492 3925151 
Computed on v20141205, SPARQL Source: http://tinyurl.com/ocyb2ld 
Total number of triples is 190637851 (http://tinyurl.com/pkyvmnc)
BioSD Data (Common Attribute Types) 
SPARQL Source: http://goo.gl/wk0RHp 
Tag Cloud made with http://www.wordle.net 
(2013)
Main Ontologies used in BioSD / Linked Data 
• See Doc Page http://www.ebi.ac.uk/rdf/documentation/biosamples 
• biosd-terms (http://tiny.cc/biosd_terms) 
– a small application ontology defining specific classes and properties, e.g., sample, 
sample group, has-knowledgeable-person 
• Experimental Factors Ontology (EFO) 
– mainly to define/annotate sample attributes 
• Ontology for Biomedical Investigations (OBI) 
• Information Artefacts Ontology (IAO) 
• Semantic Science Ontology (SIO) 
– to define main classes in BioSD/RDF 
• Bibliographic Ontology (BIBO) 
– We link publications about submissions/sample sets 
• Dublin Core, schema.org, FOAF 
– for general categories and in the Linked Data spirit 
• Linked automatically by Zooma: many more (e.g., CHEBI, NCBI-Tax, GO)

More Related Content

What's hot (20)

PPTX
Triple Stores
Stephan Volmer
 
PPTX
SWT Lecture Session 2 - RDF
Mariano Rodriguez-Muro
 
PPTX
OWL: Yet to arrive on the Web of Data?
Aidan Hogan
 
PPT
BHL @ #TDWG09 - with discussion
Chris Freeland
 
PPT
Bio solr building a better search for bioinformatics
Charlie Hull
 
PPTX
P7 2018 biopython3
Prof. Wim Van Criekinge
 
PPT
Querying the Semantic Web with SPARQL
Emanuele Della Valle
 
PPTX
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
Philippe Rocca-Serra
 
PPT
Ist16-04 An introduction to RDF
Emanuele Della Valle
 
PDF
JVM Internals - NHJUG Jan 2012
Doug Hawkins
 
PPTX
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
Stuart Chalk
 
PPTX
Ontologies: Necessary, but not sufficient
robertstevens65
 
PDF
An Introduction to RDF and the Web of Data
Olaf Hartig
 
PDF
Linked (Open) Data
Bernhard Haslhofer
 
PDF
Notes from the Library Juice Academy courses on “SPARQL Fundamentals”: Univer...
Allison Jai O'Dell
 
PDF
A Comparison Between Python APIs For RDF Processing
lucianb
 
PPT
Ontologies in RDF-S/OWL
Emanuele Della Valle
 
PPT
Chado for evolutionary biology
Chris Mungall
 
PPT
Chado introduction
Chris Mungall
 
PPTX
Caporaso sloan qiime_workshop_slides_18_oct2012
gregcaporaso
 
Triple Stores
Stephan Volmer
 
SWT Lecture Session 2 - RDF
Mariano Rodriguez-Muro
 
OWL: Yet to arrive on the Web of Data?
Aidan Hogan
 
BHL @ #TDWG09 - with discussion
Chris Freeland
 
Bio solr building a better search for bioinformatics
Charlie Hull
 
P7 2018 biopython3
Prof. Wim Van Criekinge
 
Querying the Semantic Web with SPARQL
Emanuele Della Valle
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
Philippe Rocca-Serra
 
Ist16-04 An introduction to RDF
Emanuele Della Valle
 
JVM Internals - NHJUG Jan 2012
Doug Hawkins
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
Stuart Chalk
 
Ontologies: Necessary, but not sufficient
robertstevens65
 
An Introduction to RDF and the Web of Data
Olaf Hartig
 
Linked (Open) Data
Bernhard Haslhofer
 
Notes from the Library Juice Academy courses on “SPARQL Fundamentals”: Univer...
Allison Jai O'Dell
 
A Comparison Between Python APIs For RDF Processing
lucianb
 
Ontologies in RDF-S/OWL
Emanuele Della Valle
 
Chado for evolutionary biology
Chris Mungall
 
Chado introduction
Chris Mungall
 
Caporaso sloan qiime_workshop_slides_18_oct2012
gregcaporaso
 

Similar to BioSD Tutorial 2014 Editition (20)

PDF
Introduction to BioHackathon 2014
Toshiaki Katayama
 
PDF
Connecting life sciences data at the European Bioinformatics Institute
Connected Data World
 
PDF
Use of open_linked_data_in_bioinformatics
Remzi Çelebi
 
PDF
Bio ontologies and semantic technologies
Prof. Wim Van Criekinge
 
PDF
Bio2RDF presentation at Combine 2012
François Belleau
 
PDF
Querying Bio2RDF data
alison.callahan
 
PDF
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Michel Dumontier
 
PDF
Current advances to bridge the usability-expressivity gap in biomedical seman...
Maulik Kamdar
 
PPT
2008 11 13 Hcls Call
Jun Zhao
 
PPT
2010 03 Lodoxf Openflydata
Jun Zhao
 
PPT
2009 Dils Flyweb
Jun Zhao
 
PDF
Ontology Services for the Biomedical Sciences
Connected Data World
 
PPTX
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Michel Dumontier
 
PDF
Bio2RDF @ W3C HCLS2009
François Belleau
 
PPTX
Fostering Serendipity through Big Linked Data
Muhammad Saleem
 
ODP
2009 0807 Lod Gmod
Jun Zhao
 
PPTX
The Progress on Sagace and Data Integration
Maori Ito
 
PPTX
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
Erich Gombocz
 
PDF
Bio ontologies and semantic technologies[2]
Prof. Wim Van Criekinge
 
PPTX
Quantifying the content of biomedical semantic resources as a core for drug d...
Syed Muhammad Ali Hasnain
 
Introduction to BioHackathon 2014
Toshiaki Katayama
 
Connecting life sciences data at the European Bioinformatics Institute
Connected Data World
 
Use of open_linked_data_in_bioinformatics
Remzi Çelebi
 
Bio ontologies and semantic technologies
Prof. Wim Van Criekinge
 
Bio2RDF presentation at Combine 2012
François Belleau
 
Querying Bio2RDF data
alison.callahan
 
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Michel Dumontier
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Maulik Kamdar
 
2008 11 13 Hcls Call
Jun Zhao
 
2010 03 Lodoxf Openflydata
Jun Zhao
 
2009 Dils Flyweb
Jun Zhao
 
Ontology Services for the Biomedical Sciences
Connected Data World
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Michel Dumontier
 
Bio2RDF @ W3C HCLS2009
François Belleau
 
Fostering Serendipity through Big Linked Data
Muhammad Saleem
 
2009 0807 Lod Gmod
Jun Zhao
 
The Progress on Sagace and Data Integration
Maori Ito
 
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
Erich Gombocz
 
Bio ontologies and semantic technologies[2]
Prof. Wim Van Criekinge
 
Quantifying the content of biomedical semantic resources as a core for drug d...
Syed Muhammad Ali Hasnain
 
Ad

More from Rothamsted Research, UK (20)

PPTX
FAIR Agronomy, where are we? The KnetMiner Use Case
Rothamsted Research, UK
 
PPTX
Interoperable Data for KnetMiner and DFW Use Cases
Rothamsted Research, UK
 
PPTX
AgriSchemas: Sharing Agrifood data with Bioschemas
Rothamsted Research, UK
 
PPTX
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Rothamsted Research, UK
 
PPTX
Continuos Integration @Knetminer
Rothamsted Research, UK
 
PDF
Better Data for a Better World
Rothamsted Research, UK
 
PPTX
AgriSchemas Progress Report
Rothamsted Research, UK
 
PPTX
AgriFood Data, Models, Standards, Tools, Use Cases
Rothamsted Research, UK
 
PDF
Notes about SWAT4LS 2018
Rothamsted Research, UK
 
PPTX
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Rothamsted Research, UK
 
PPTX
Knetminer Backend Training, Nov 2018
Rothamsted Research, UK
 
PPTX
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
Rothamsted Research, UK
 
PDF
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Rothamsted Research, UK
 
PDF
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Rothamsted Research, UK
 
ODP
graph2tab, a library to convert experimental workflow graphs into tabular for...
Rothamsted Research, UK
 
PDF
Interoperable Open Data: Which Recipes?
Rothamsted Research, UK
 
PDF
Linked Data with the EBI RDF Platform
Rothamsted Research, UK
 
PDF
BioSD Linked Data: Lessons Learned
Rothamsted Research, UK
 
PDF
myEquivalents, aka a new cross-reference service
Rothamsted Research, UK
 
PDF
Dev 2014 LOD tutorial
Rothamsted Research, UK
 
FAIR Agronomy, where are we? The KnetMiner Use Case
Rothamsted Research, UK
 
Interoperable Data for KnetMiner and DFW Use Cases
Rothamsted Research, UK
 
AgriSchemas: Sharing Agrifood data with Bioschemas
Rothamsted Research, UK
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Rothamsted Research, UK
 
Continuos Integration @Knetminer
Rothamsted Research, UK
 
Better Data for a Better World
Rothamsted Research, UK
 
AgriSchemas Progress Report
Rothamsted Research, UK
 
AgriFood Data, Models, Standards, Tools, Use Cases
Rothamsted Research, UK
 
Notes about SWAT4LS 2018
Rothamsted Research, UK
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Rothamsted Research, UK
 
Knetminer Backend Training, Nov 2018
Rothamsted Research, UK
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
Rothamsted Research, UK
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Rothamsted Research, UK
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Rothamsted Research, UK
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
Rothamsted Research, UK
 
Interoperable Open Data: Which Recipes?
Rothamsted Research, UK
 
Linked Data with the EBI RDF Platform
Rothamsted Research, UK
 
BioSD Linked Data: Lessons Learned
Rothamsted Research, UK
 
myEquivalents, aka a new cross-reference service
Rothamsted Research, UK
 
Dev 2014 LOD tutorial
Rothamsted Research, UK
 
Ad

Recently uploaded (20)

PPTX
Beyond Compliance Embracing Quality by Design (QbD) for Next-Generation Pharm...
Dr. Smita Kumbhar
 
PDF
SCHIZOPHRENIA (ANTIPSYCHOTIC DRUGS)..pdf
SATENDRAPRADHAN1
 
PPTX
questionnaires in and how to map as per CT and aCRF
JampaniGangadhar
 
PPTX
2.5 Role of Nasal & Pharyngeal Cavity in Voice Production (aqsa mehsood).pptx
Aqsa Mehsood
 
PPTX
OBESITY and the underlying physiology.pptx
Dr. Sukriti Silwal
 
PPTX
Benign Paroxysmal Positional Vertigo (Bppv)
Tejalvarpe
 
PPTX
maternal pelvis and it's diameters in obstetrics
aniyakhan948
 
DOCX
Why Inflammation Markers Are Reshaping Heart Disease Risk Assessment
Ram Gopal Varma
 
PPTX
Bioavailability and Bioequivalence studies
Principal42
 
PPTX
Complete Drug Discovery Process, AI.pptx
sumitdevkar50
 
PPTX
OMODELE MORENIKE PRESENTATION NAVDOC-4.pptx
Omodelemorenike
 
PPTX
Code Stroke Management / Management of Acute Stroke
GODWIN SUJIN
 
PPTX
Diabetes Mellitus – Causes, Types, Risks & Prevention
Thacin Ahmed Pranto
 
PPTX
Regulatory Aspects of Herbal and Biologics in INDIA.pptx
Aaditi Kamble
 
PPTX
7. THORACIC SURGERY (PULMONARY SURGERY) (Part 1).pptx
Bolan University of Medical and Health Sciences ,Quetta
 
PPTX
Amlapitta(Acid peptic Disease) Ayurvedic managment strategies
Dr. Nayan Mehar
 
PPTX
Cleaning validation SlideShare presentation
preethibs6
 
PPTX
Fetal skull and it's diameters in obstetrics
aniyakhan948
 
PPTX
Distal Radius Fractures.pptx for bhms students
DR.P.S SUDHAKAR
 
PPTX
COPD (Chronic Obstructive Pulmonary Disease) .pptx
Dr. Sukriti Silwal
 
Beyond Compliance Embracing Quality by Design (QbD) for Next-Generation Pharm...
Dr. Smita Kumbhar
 
SCHIZOPHRENIA (ANTIPSYCHOTIC DRUGS)..pdf
SATENDRAPRADHAN1
 
questionnaires in and how to map as per CT and aCRF
JampaniGangadhar
 
2.5 Role of Nasal & Pharyngeal Cavity in Voice Production (aqsa mehsood).pptx
Aqsa Mehsood
 
OBESITY and the underlying physiology.pptx
Dr. Sukriti Silwal
 
Benign Paroxysmal Positional Vertigo (Bppv)
Tejalvarpe
 
maternal pelvis and it's diameters in obstetrics
aniyakhan948
 
Why Inflammation Markers Are Reshaping Heart Disease Risk Assessment
Ram Gopal Varma
 
Bioavailability and Bioequivalence studies
Principal42
 
Complete Drug Discovery Process, AI.pptx
sumitdevkar50
 
OMODELE MORENIKE PRESENTATION NAVDOC-4.pptx
Omodelemorenike
 
Code Stroke Management / Management of Acute Stroke
GODWIN SUJIN
 
Diabetes Mellitus – Causes, Types, Risks & Prevention
Thacin Ahmed Pranto
 
Regulatory Aspects of Herbal and Biologics in INDIA.pptx
Aaditi Kamble
 
7. THORACIC SURGERY (PULMONARY SURGERY) (Part 1).pptx
Bolan University of Medical and Health Sciences ,Quetta
 
Amlapitta(Acid peptic Disease) Ayurvedic managment strategies
Dr. Nayan Mehar
 
Cleaning validation SlideShare presentation
preethibs6
 
Fetal skull and it's diameters in obstetrics
aniyakhan948
 
Distal Radius Fractures.pptx for bhms students
DR.P.S SUDHAKAR
 
COPD (Chronic Obstructive Pulmonary Disease) .pptx
Dr. Sukriti Silwal
 

BioSD Tutorial 2014 Editition

  • 1. BioSamples Database Linked Data (2014 edition) Marco Brandizi, Functional Genomics Team SWAT4LS Tutorial, Dec 9th, 2014 Find this presentation on http://www.slideshare.net/mbrandizi
  • 2. Why a BioSamples Database (aka BioSD)? • A reference system, where to search/browse information about biological samples, used/useable for biomedical experiments • Focused on the sample context (i.e., independent on the specific assay type/technology) • Supports heterogeneous experiments – Single place that assay repositories can link (reference samples, authoritative source for repositories like Metagenomics/ENA/ArrayExpress) – Single place for searches and related-to or same-as relationships (e.g., see the 'myEquivalents' project) • Common interfaces to access sample information and links to specific data/repositories (e.g., web, XML/REST, RDF)
  • 3. Why Linked Data for BioSD? • Potentially useful to application developers and Linked Data tools • Integration with similar/related data-sets • Exploitation of ontologies – Standardisation – A little semantics goes a long way – Improved searching • As usually, open to unexpected uses – e.g., http://www.phyloviz.net/NGSonto
  • 4. The BioSD Model Sample Groups Submission External links Samples http://www.ebi.ac.uk/biosamples
  • 5. The BioSD Model Group's (or Submission's) samples Sample's (or Groups') attribute types and values External links
  • 6. Changes to Linked Data Model • • Main Main Entities: Entities: http://http://tinyurl.tinyurl.com/com/lo33ncc lo33ncc • • Details Details about about Sample Sample Attributes: Attributes: http://http://tinyurl.tinyurl.com/com/n5oyvyd n5oyvyd Several improvements to the conversion software, Several improvements to the conversion software, A Aimiminingg aatt mmoorree ffrreeqquueenntt aauuttoo--uuppddaatteess
  • 8. Find Samples and attributes PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> PREFIX sio: <http://semanticscience.org/resource/> SELECT DISTINCT ?smp ?pvLabel ?propTypeLabel WHERE { ?smp a biosd-terms:Sample; biosd-terms:has-bio-characteristic | sio:SIO_000332 ?pv. # is about ?pv rdfs:label ?pvLabel; biosd-terms:has-bio-characteristic-type ?pvType. ?pvType rdfs:label ?propTypeLabel. } • Exercise: use FILTER()/REGEX() to find organism=homo sapiens • Exercise: Find sample' repositories of provenance and their links – Hint: explore the sample's links (?smp) and see how RepositoryWebRecord looks like Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql Excercise Solution: see examples on such page
  • 9. Samples about a given organism PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> SELECT DISTINCT ?smp ?pvLabel ?propTypeLabel WHERE { ?smp biosd-terms:has-bio-characteristic ?pv. ?pv biosd-terms:has-bio-characteristic-type ?pvType; rdfs:label ?pvLabel. ?pvType a ?pvTypeClass. # Listeria ?pvTypeClass rdfs:label ?propTypeLabel; # '*' gives you transitive closure, even when inference is disabled rdfs:subClassOf* <http://purl.obolibrary.org/obo/NCBITaxon_1637> } • Exercise: Use the Bioportal Service to first find all subclasses of 'alcohol' (obo:CHEBI_30879) and then search samples annotated with such subclasses – Hint: Use SERVICE <http://sparql.bioontology.org/ontologies/sparql/?apikey=KEY> Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql Excercise Solution: see one of the examples on such page
  • 10. Geo-located Samples/Sample Groups PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> PREFIX sio: <http://semanticscience.org/resource/> SELECT DISTINCT ?item ?latVal ?longVal WHERE { ?item biosd-terms:has-bio-characteristic ?latPv, ?longPv. ?latPv biosd-terms:has-bio-characteristic-type [ rdfs:label ?latLabel]; sio:SIO_000300 ?latVal. # sio:has value FILTER ( REGEX ( ?latLabel, "latitude", "i" ) ). ?longPv biosd-terms:has-bio-characteristic-type [ rdfs:label ?longLabel ]; sio:SIO_000300 ?longVal. # sio:has value FILTER ( REGEX ( ?longLabel, "longitude", "i" ) ). } • Find all samples having an attribute of type temperature, with a numerical value and a unit specified. Hint: use sio:SIO_000221 (has unit), sio:SIO_000300 (has value) • Find samples/groups annotated with intervals, which use the properties biosd-terms:has-low-value and has-high-value and optionally have a unit. Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql Excercise Solutions: see examples on that page
  • 11. Expressed Genes and Samples • For http://purl.uniprot.org/uniprot/P04637 (P53 in Human) • Find the EFO classes for which it is up-regulated in the Atlas (p-value < 1E-9) • And show the Atlas expression value label . Hints: – Start from the example http://tinyurl.com/kvvhw6b, – Use the Atlas endpoint: http://www.ebi.ac.uk/rdf/services/atlas/sparql • Find the samples having attributes that are instances of such EFO classes • Which comes from a repository other than 'ArrayExpress' • Hints: – Use SERVICE <http://www.ebi.ac.uk/rdf/services/biosamples/sparql> and a sub-query – Search property values linked to prop. types that are instances of the e.f. found by the Atlas – Then link to the samples, the samples to the submissions, the submissions to the web records ● OR JUST HAVE A LOOK: http://goo.gl/kOfE1r (will take a while...)
  • 12. New Ideas and Alike
  • 13. Geo-Samples, Google Map Integration • Exercise: From geo-located samples to Google Map. Think how to do it: ● Gmaps supports the KML format (https://developers.google.com/kml) ● You can type a KML-returning URL into maps.google.com (or pass it via GET, q=<kml-url>) ● The SPARQL endpoint can return results in XML format ● There are on line XSLTs: http://services.w3.org/xslt?xslfile=<url>&xmlfile=<url> http://tinyurl.com/kzd2pg4 http://tinyurl.com/lf2623l http://tinyurl.com/lltqy2u http://goo.gl/maps/CMRrk Many thanks to Costanza Romano
  • 14. Search-by-Feature Similarity (ongoing) SELECT DISTINCT ?smp ? smpDescr (COUNT (DISTINCT ?pv) AS ?score) WHERE { { ?smp a biosd-terms:Sample; rdfs:comment ?smpDescr. ?smp biosd-terms:has-bio-characteristic ?pv. ?pv biosd-terms:has-bio-characteristic-type ?pvType. ?pvType a <http://purl.obolibrary.org/obo/NCBITaxon_10090>. } UNION { ?smp a biosd-terms:Sample; rdfs:comment ?smpDescr. ... ?pvType a <http://purl.obolibrary.org/obo/NCBITaxon_10090>. } UNION ... } GROUP BY ?smp ?smpDescr HAVING (COUNT (DISTINCT ?pv) > 0) ORDER BY DESC (COUNT (DISTINCT ?pv)) • Many thanks to AbdulShakur Abdullah, Eric Hillaert, Prasad Nuli (https://github.com/CapStoneEBI2014/biosd_similarity_search)
  • 15. More (possibly for the hackathon) • Continuing with the similarity search • Improving linkage with other data sets – e.g., targeting samples in ArrayExpress/Atlas – e.g., links to EPMC data sets (PMID->PMC conversion), Bio2RDF publications, LLD publications • Aiming at supporting similar datasets – Interested in the on-going HCLS' work about HL7->RDF – Collaborating with the European biobank community ● Interested in the BBMRI-ontology (http://tinyurl.com/qjttyge) • Visualisations/widgets – Geo-located samples on a map – Samples on body map – Using the BioJS library
  • 16. Acknowledgements • BioSD Team - Alvis Brazma, Tony Burdett, Adam Faulconbridge, Mike Gostev, Helen Parkinson, Rui Perreria, Ugis Sarkans, Drashtti Vasant • Tony Burdett for the help with Zooma • Simon Jupp, Andy Jenkinson, James Malone, for their great help with developing and setting up BioSD/RDF – The rest of the Linked Data team @EBI (http://www.ebi.ac.uk/rdf) • BiomedBridges FP7 project (http://www.biomedbridges.eu), for funding us
  • 17. And you all! Contact info: www.ebi.ac.uk/biosamples www.marcobrandizi.info Sorry, we have grown to ~4M samples, yet we don't have all of them, not even this year... (Sources: http://en.wikipedia.org/wiki/File:Assorted_computer_mice_-_MfK_Bern.jpg, http://tinyurl.com/otfnhk6, http://tinyurl.com/odkadvn, http://tinyurl.com/pyrqrdf)
  • 19. BioSD Data (External Data Sources) SPARQL Source: http://tinyurl.com/o95xa5v Tag Cloud made with http://www.wordle.net (2013) submissions sampleGroups samples 126490 126492 3925151 Computed on v20141205, SPARQL Source: http://tinyurl.com/ocyb2ld Total number of triples is 190637851 (http://tinyurl.com/pkyvmnc)
  • 20. BioSD Data (Common Attribute Types) SPARQL Source: http://goo.gl/wk0RHp Tag Cloud made with http://www.wordle.net (2013)
  • 21. Main Ontologies used in BioSD / Linked Data • See Doc Page http://www.ebi.ac.uk/rdf/documentation/biosamples • biosd-terms (http://tiny.cc/biosd_terms) – a small application ontology defining specific classes and properties, e.g., sample, sample group, has-knowledgeable-person • Experimental Factors Ontology (EFO) – mainly to define/annotate sample attributes • Ontology for Biomedical Investigations (OBI) • Information Artefacts Ontology (IAO) • Semantic Science Ontology (SIO) – to define main classes in BioSD/RDF • Bibliographic Ontology (BIBO) – We link publications about submissions/sample sets • Dublin Core, schema.org, FOAF – for general categories and in the Linked Data spirit • Linked automatically by Zooma: many more (e.g., CHEBI, NCBI-Tax, GO)