SlideShare a Scribd company logo
BioSamples Database Linked DataBioSamples Database Linked Data
Marco Brandizi, Functional Genomics Team
SWAT4LS Tutorial, Dec 9th, 2013
Find this presentation at http://tiny.cc/bsdswt13
• A reference system, where to search/browse information about biological
samples used/useable for biomedical experiments
• Focused on the sample context (i.e., independent on the specific assay
type/technology)
• Supports heterogeneous experiments
– Single place assay repositories can link (reference samples,
authoritative source for repositories like
Metagenomics/ENA/ArrayExpress)
– Single place for searches and related-to or same-as relationships
(e.g., see the 'myEquivalents' project)
• Allows for consistency/standardisation of sample attributes/annotations
• Common IT interfaces to access sample information and links to specific
data/repositories (e.g., web, XML/REST, RDF)
Why a BioSamples Database (aka BioSD)?
• Yet another type of interface, potentially useful to application developers
and Linked Data tools
• Integration with similar/related data-sets (see example queries below!)
• Exploitation of ontologies (see below!)
– Standardisation
– A little semantics goes a long way
• Modelling of certain aspects enhanced
– e.g., numbers, intervals, dates, units are detected from string value
labels and triplified.
• Who knows?
– Apps!
– See Hackaton ideas below!
Why Linked Data for BioSD?
The BioSD Model
Sample Groups
Submission
External links
Samples
http://www.ebi.ac.uk/biosamples
The BioSD Model
Group's (or Submission's) samples
Sample's (or Groups') attribute types
and values
External links
BioSD Data (External Data Sources)
SPARQL Source: http://tinyurl.com/o95xa5v
Tag Cloud made with http://www.wordle.net
SPARQL Source: http://tinyurl.com/ocyb2ld
BioSD Data (Common Attribute Types)
SPARQL Source: http://tinyurl.com/pjgdtzs
Tag Cloud made with http://www.wordle.net
BioSD Linked Data Model (Main Entities)
Please have a look at:
http://tinyurl.com/lo33ncc
BioSD Linked Data Model (Sample Attributes)
Please have a look at:
http://tinyurl.com/n5oyvyd
SPARQL Queries
Find Samples and attributes
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/>
PREFIX sio: <http://semanticscience.org/resource/>
SELECT DISTINCT ?smp ?pvLabel ?propTypeLabel
WHERE
{
?smp
a biosd-terms:Sample;
biosd-terms:has-bio-characteristic | sio:SIO_000332 ?pv. # is about
?pv
rdfs:label ?pvLabel;
biosd-terms:has-bio-characteristic-type ?pvType.
?pvType
rdfs:label ?propTypeLabel.
}
• Exercise: use FILTER()/REGEX() to find organism=homo sapiens
• Exercise: Find sample provenance repositories and their links
– Hint: explore the sample's links (?smp) and see how RepositoryWebRecord
looks like
Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql
Excercise Solution: see examples on such page
Samples about a given organism
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/>
SELECT DISTINCT ?smp ?pvLabel ?propTypeLabel
WHERE {
?smp biosd-terms:has-bio-characteristic ?pv.
?pv biosd-terms:has-bio-characteristic-type ?pvType;
rdfs:label ?pvLabel.
?pvType a ?pvTypeClass.
# Listeria
?pvTypeClass
rdfs:label ?propTypeLabel;
# '*' gives you transitive closure, even when inference is didsbled
rdfs:subClassOf* <http://purl.obolibrary.org/obo/NCBITaxon_1637>
}
• Exercise: Use the Bioportal Service to first find all subclasses of 'alchool' (obo:CHEBI_30879)
and then search samples annotated with such subclasses
– Hint: Use SERVICE <http://sparql.bioontology.org/ontologies/sparql/?apikey=KEY>
Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql
Excercise Solution: see one of the examples on such page
Geo-located Samples/Sample Groups
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/>
PREFIX sio: <http://semanticscience.org/resource/>
SELECT DISTINCT ?item ?latVal ?longVal WHERE {
?item biosd-terms:has-bio-characteristic ?latPv, ?longPv.
?latPv
biosd-terms:has-bio-characteristic-type [ rdfs:label ?latLabel];
sio:SIO_000300 ?latVal. # sio:has value
FILTER ( REGEX ( ?latLabel, "latitude", "i" ) ).
?longPv
biosd-terms:has-bio-characteristic-type [ rdfs:label ?longLabel ];
sio:SIO_000300 ?longVal. # sio:has value
FILTER ( REGEX ( ?longLabel, "longitude", "i" ) ).
}
• Find all samples having an attribute of type temperature, with a numerical value and a unit
specified. Hint: use sio:SIO_000221 (has unit), sio:SIO_000300 (has value)
• Find samples/groups annotated with intervals, which use the properties biosd-terms:has-low-
value and has-high-value and optionally have a unit.
Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql
Excercise Solutions: see examples on that page
Expressed Genes and Samples
• For http://purl.uniprot.org/uniprot/P04637 (P53 in Human)
• Find the EFO classes for which it is up-regulated in the Atlas (p-value < 1E-9)
• And show the atlas expression value label . Hints:
– Start from the example http://tinyurl.com/kvvhw6b,
– Use the Atlas endpoint: http://www.ebi.ac.uk/rdf/services/atlas/sparql
• Find the samples having attributes that are instances of such EFO classes
• Which comes from a repository other than 'ArrayExpress'
• Hints:
– Use SERVICE <http://www.ebi.ac.uk/rdf/services/biosamples/sparql> and a sub-query
– Search property values linked to prop. types that are instances of the e.f. found by the
Atlas
– Then link to the samples, the samples to the submissions, the submissions to the web
records
●
OR JUST HAVE A LOOK: http://tinyurl.com/ln3m7nv (will take a while...)
Ideas for the Hackaton
• Refer to http://tinyurl.com/mo7wgye for details
• From geo-located samples (samples annotated with latitude/longitude) to Google
maps, e.g, by using Exhibit (http://www.simile-widgets.org/exhibit/)
• Take similar datasets (e.g., MAASTRO, Breast Cancer Data, your data), unify the
schemas (e.g., using CONSTRUCT), define federated queries
• Use the Shape or OpenPHACTS validator to define sensible rules for BioSD and
similar data-sets, e.g., must contain an organism, should have a treatment
• Design/build an App (or Web widget) that asks for eligibility criterion, i.e., pairs of
attribute value/type, and translate it into a SPARQL query (or a more complex
search based on SPARQL) to find samples
– Use common ontologies for auto-completion over property types
– Use string-based auto-completion for values
– Consider numerical values, intervals, units
– Do approximate matching, i.e., matching 8/10 of specified pairs is good.
Acknowledgements
• BioSD Team - Alvis Brazma, Tony Burdett, Adam
Faulconbridge, Mike Gostev, Helen Parkinson, Rui Perreria,
Ugis Sarkans, Drashtti Vasant
• Tony Burdett for the help with Zooma
• Simon Jupp, Andy Jenkinson, James Malone, for their great
help with developing and setting up BioSD/RDF
– The rest of the Linked Data team @EBI
(http://www.ebi.ac.uk/rdf)
• BiomedBridges FP7 project (http://www.biomedbridges.eu), for
funding us
And you all!
Sorry, we have 2.7M samples, but not all of them...
(Source: http://en.wikipedia.org/wiki/File:Assorted_computer_mice_-_MfK_Bern.jpg)
Contact info:
www.ebi.ac.uk/biosamples
www.marcobrandizi.info
Extras
• biosd-terms (http://tiny.cc/biosd_terms)
– a small application ontology defining specific classes and properties, e.g.,
sample, sample group, has-knowledgeable-person
• Experimental Factors Ontology (EFO)
– mainly to define/annotate sample attributes
• Ontology for Biomedical Investigations (OBI)
• Information Artefacts Ontology (IAO)
• Semantic Science Ontology (SIO)
– to define main classes in BioSD/RDF
• Bibliographic Ontology (BIBO)
– We link publications about submissions/sample sets
• Dublin Core, schema.org, FOAF
– for general categories and in the Linked Data spirit
• Linked automatically by Zooma: many more (e.g., CHEBI, NCBI-Tax, GO)
Main Ontologies used in BioSD / Linked Data
BioSD → RDF
Conversion
github.com/EBIBioSamples/biosd2rdf
github.com/EBIBioSamples/biosd2rdf

More Related Content

What's hot (20)

PPTX
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
Stuart Chalk
 
PPTX
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
Stuart Chalk
 
PPTX
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
robertstevens65
 
PPTX
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
Neo4j
 
PPT
Clustering the royal society of chemistry chemical repository to enable enhan...
Valery Tkachenko
 
PPTX
Crediting informatics and data folks in life science teams
Carole Goble
 
PDF
Research Shared: researchobject.org
Norman Morrison
 
PPT
Importing life science at a into Neo4j
Simon Jupp
 
PPTX
Semantics as a service at EMBL-EBI
Simon Jupp
 
PPTX
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble
 
PDF
SWAT4LS 2014 SLIDE by Yamamoto
yayamamo @ DBCLS Kashiwanoha
 
PPTX
Co l+ clearinghouse
Markus Döring
 
PPT
eXframe: A Semantic Web Platform for Genomic Experiments
Tim Clark
 
PPT
exFrame: a Semantic Web Platform for Genomics Experiments
Tim Clark
 
PDF
Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...
Lucidworks
 
PPTX
Quality Metrics for Linked Open Data
ebrahim_bagheri
 
PPTX
How to make your published data findable, accessible, interoperable and reusable
Phoenix Bioinformatics
 
PPTX
Reproducibility, Research Objects and Reality, Leiden 2016
Carole Goble
 
PPTX
Community curation at PomBase
Valerie Wood
 
PPTX
Mtsr2015 goble-keynote
Carole Goble
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
Stuart Chalk
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
Stuart Chalk
 
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
robertstevens65
 
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
Neo4j
 
Clustering the royal society of chemistry chemical repository to enable enhan...
Valery Tkachenko
 
Crediting informatics and data folks in life science teams
Carole Goble
 
Research Shared: researchobject.org
Norman Morrison
 
Importing life science at a into Neo4j
Simon Jupp
 
Semantics as a service at EMBL-EBI
Simon Jupp
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble
 
SWAT4LS 2014 SLIDE by Yamamoto
yayamamo @ DBCLS Kashiwanoha
 
Co l+ clearinghouse
Markus Döring
 
eXframe: A Semantic Web Platform for Genomic Experiments
Tim Clark
 
exFrame: a Semantic Web Platform for Genomics Experiments
Tim Clark
 
Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...
Lucidworks
 
Quality Metrics for Linked Open Data
ebrahim_bagheri
 
How to make your published data findable, accessible, interoperable and reusable
Phoenix Bioinformatics
 
Reproducibility, Research Objects and Reality, Leiden 2016
Carole Goble
 
Community curation at PomBase
Valerie Wood
 
Mtsr2015 goble-keynote
Carole Goble
 

Similar to BioSamples Database Linked Data, SWAT4LS Tutorial (20)

PPT
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
ICZN
 
PPT
NCBO SPARQL Endpoint
Trish Whetzel
 
PPT
2009 Dils Flyweb
Jun Zhao
 
PPT
Beyond Transparency: Success & Lessons From tambisBoston2003
robertstevens65
 
PDF
Semantic Web: Ontology Engineering Presentation
yvvijay28
 
PPTX
SEEK for Science: A Data and Model Management Platform to support Open and Re...
Carole Goble
 
PDF
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Alejandra Gonzalez-Beltran
 
PDF
Connecting life sciences data at the European Bioinformatics Institute
Connected Data World
 
PPTX
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
Philippe Rocca-Serra
 
PPTX
The Rhetoric of Research Objects
Carole Goble
 
PPTX
The FAIRDOM Commons for Systems Biology
FAIRDOM
 
PPT
2010 03 Lodoxf Openflydata
Jun Zhao
 
ODP
2011 03-provenance-workshop-edingurgh
Jun Zhao
 
PPT
2008 11 13 Hcls Call
Jun Zhao
 
PDF
Oxford DTP - Sansone curation tools - Dec 2014
Susanna-Assunta Sansone
 
PPT
Finding knowledge, data and answers on the Semantic Web
ebiquity
 
PPTX
Case Study in Linked Data and Semantic Web: Human Genome
David Portnoy
 
PPTX
Designing a community resource - Sandra Orchard
EMBL-ABR
 
PDF
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
Bruce Kozuma
 
PDF
Ontologies and semantic web
Stanley Wang
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
ICZN
 
NCBO SPARQL Endpoint
Trish Whetzel
 
2009 Dils Flyweb
Jun Zhao
 
Beyond Transparency: Success & Lessons From tambisBoston2003
robertstevens65
 
Semantic Web: Ontology Engineering Presentation
yvvijay28
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
Carole Goble
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Alejandra Gonzalez-Beltran
 
Connecting life sciences data at the European Bioinformatics Institute
Connected Data World
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
Philippe Rocca-Serra
 
The Rhetoric of Research Objects
Carole Goble
 
The FAIRDOM Commons for Systems Biology
FAIRDOM
 
2010 03 Lodoxf Openflydata
Jun Zhao
 
2011 03-provenance-workshop-edingurgh
Jun Zhao
 
2008 11 13 Hcls Call
Jun Zhao
 
Oxford DTP - Sansone curation tools - Dec 2014
Susanna-Assunta Sansone
 
Finding knowledge, data and answers on the Semantic Web
ebiquity
 
Case Study in Linked Data and Semantic Web: Human Genome
David Portnoy
 
Designing a community resource - Sandra Orchard
EMBL-ABR
 
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1
Bruce Kozuma
 
Ontologies and semantic web
Stanley Wang
 
Ad

More from Rothamsted Research, UK (20)

PPTX
FAIR Agronomy, where are we? The KnetMiner Use Case
Rothamsted Research, UK
 
PPTX
Interoperable Data for KnetMiner and DFW Use Cases
Rothamsted Research, UK
 
PPTX
AgriSchemas: Sharing Agrifood data with Bioschemas
Rothamsted Research, UK
 
PPTX
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Rothamsted Research, UK
 
PPTX
Continuos Integration @Knetminer
Rothamsted Research, UK
 
PDF
Better Data for a Better World
Rothamsted Research, UK
 
PPTX
AgriSchemas Progress Report
Rothamsted Research, UK
 
PPTX
AgriFood Data, Models, Standards, Tools, Use Cases
Rothamsted Research, UK
 
PDF
Notes about SWAT4LS 2018
Rothamsted Research, UK
 
PPTX
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Rothamsted Research, UK
 
PPTX
Knetminer Backend Training, Nov 2018
Rothamsted Research, UK
 
PPTX
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
Rothamsted Research, UK
 
PDF
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Rothamsted Research, UK
 
PDF
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Rothamsted Research, UK
 
ODP
graph2tab, a library to convert experimental workflow graphs into tabular for...
Rothamsted Research, UK
 
PDF
Interoperable Open Data: Which Recipes?
Rothamsted Research, UK
 
PDF
Linked Data with the EBI RDF Platform
Rothamsted Research, UK
 
PDF
BioSD Linked Data: Lessons Learned
Rothamsted Research, UK
 
PDF
myEquivalents, aka a new cross-reference service
Rothamsted Research, UK
 
PDF
Dev 2014 LOD tutorial
Rothamsted Research, UK
 
FAIR Agronomy, where are we? The KnetMiner Use Case
Rothamsted Research, UK
 
Interoperable Data for KnetMiner and DFW Use Cases
Rothamsted Research, UK
 
AgriSchemas: Sharing Agrifood data with Bioschemas
Rothamsted Research, UK
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Rothamsted Research, UK
 
Continuos Integration @Knetminer
Rothamsted Research, UK
 
Better Data for a Better World
Rothamsted Research, UK
 
AgriSchemas Progress Report
Rothamsted Research, UK
 
AgriFood Data, Models, Standards, Tools, Use Cases
Rothamsted Research, UK
 
Notes about SWAT4LS 2018
Rothamsted Research, UK
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Rothamsted Research, UK
 
Knetminer Backend Training, Nov 2018
Rothamsted Research, UK
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
Rothamsted Research, UK
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Rothamsted Research, UK
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Rothamsted Research, UK
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
Rothamsted Research, UK
 
Interoperable Open Data: Which Recipes?
Rothamsted Research, UK
 
Linked Data with the EBI RDF Platform
Rothamsted Research, UK
 
BioSD Linked Data: Lessons Learned
Rothamsted Research, UK
 
myEquivalents, aka a new cross-reference service
Rothamsted Research, UK
 
Dev 2014 LOD tutorial
Rothamsted Research, UK
 
Ad

Recently uploaded (20)

PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
July Patch Tuesday
Ivanti
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Python basic programing language for automation
DanialHabibi2
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
July Patch Tuesday
Ivanti
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 

BioSamples Database Linked Data, SWAT4LS Tutorial

  • 1. BioSamples Database Linked DataBioSamples Database Linked Data Marco Brandizi, Functional Genomics Team SWAT4LS Tutorial, Dec 9th, 2013 Find this presentation at http://tiny.cc/bsdswt13
  • 2. • A reference system, where to search/browse information about biological samples used/useable for biomedical experiments • Focused on the sample context (i.e., independent on the specific assay type/technology) • Supports heterogeneous experiments – Single place assay repositories can link (reference samples, authoritative source for repositories like Metagenomics/ENA/ArrayExpress) – Single place for searches and related-to or same-as relationships (e.g., see the 'myEquivalents' project) • Allows for consistency/standardisation of sample attributes/annotations • Common IT interfaces to access sample information and links to specific data/repositories (e.g., web, XML/REST, RDF) Why a BioSamples Database (aka BioSD)?
  • 3. • Yet another type of interface, potentially useful to application developers and Linked Data tools • Integration with similar/related data-sets (see example queries below!) • Exploitation of ontologies (see below!) – Standardisation – A little semantics goes a long way • Modelling of certain aspects enhanced – e.g., numbers, intervals, dates, units are detected from string value labels and triplified. • Who knows? – Apps! – See Hackaton ideas below! Why Linked Data for BioSD?
  • 4. The BioSD Model Sample Groups Submission External links Samples http://www.ebi.ac.uk/biosamples
  • 5. The BioSD Model Group's (or Submission's) samples Sample's (or Groups') attribute types and values External links
  • 6. BioSD Data (External Data Sources) SPARQL Source: http://tinyurl.com/o95xa5v Tag Cloud made with http://www.wordle.net SPARQL Source: http://tinyurl.com/ocyb2ld
  • 7. BioSD Data (Common Attribute Types) SPARQL Source: http://tinyurl.com/pjgdtzs Tag Cloud made with http://www.wordle.net
  • 8. BioSD Linked Data Model (Main Entities) Please have a look at: http://tinyurl.com/lo33ncc
  • 9. BioSD Linked Data Model (Sample Attributes) Please have a look at: http://tinyurl.com/n5oyvyd
  • 11. Find Samples and attributes PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> PREFIX sio: <http://semanticscience.org/resource/> SELECT DISTINCT ?smp ?pvLabel ?propTypeLabel WHERE { ?smp a biosd-terms:Sample; biosd-terms:has-bio-characteristic | sio:SIO_000332 ?pv. # is about ?pv rdfs:label ?pvLabel; biosd-terms:has-bio-characteristic-type ?pvType. ?pvType rdfs:label ?propTypeLabel. } • Exercise: use FILTER()/REGEX() to find organism=homo sapiens • Exercise: Find sample provenance repositories and their links – Hint: explore the sample's links (?smp) and see how RepositoryWebRecord looks like Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql Excercise Solution: see examples on such page
  • 12. Samples about a given organism PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> SELECT DISTINCT ?smp ?pvLabel ?propTypeLabel WHERE { ?smp biosd-terms:has-bio-characteristic ?pv. ?pv biosd-terms:has-bio-characteristic-type ?pvType; rdfs:label ?pvLabel. ?pvType a ?pvTypeClass. # Listeria ?pvTypeClass rdfs:label ?propTypeLabel; # '*' gives you transitive closure, even when inference is didsbled rdfs:subClassOf* <http://purl.obolibrary.org/obo/NCBITaxon_1637> } • Exercise: Use the Bioportal Service to first find all subclasses of 'alchool' (obo:CHEBI_30879) and then search samples annotated with such subclasses – Hint: Use SERVICE <http://sparql.bioontology.org/ontologies/sparql/?apikey=KEY> Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql Excercise Solution: see one of the examples on such page
  • 13. Geo-located Samples/Sample Groups PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> PREFIX sio: <http://semanticscience.org/resource/> SELECT DISTINCT ?item ?latVal ?longVal WHERE { ?item biosd-terms:has-bio-characteristic ?latPv, ?longPv. ?latPv biosd-terms:has-bio-characteristic-type [ rdfs:label ?latLabel]; sio:SIO_000300 ?latVal. # sio:has value FILTER ( REGEX ( ?latLabel, "latitude", "i" ) ). ?longPv biosd-terms:has-bio-characteristic-type [ rdfs:label ?longLabel ]; sio:SIO_000300 ?longVal. # sio:has value FILTER ( REGEX ( ?longLabel, "longitude", "i" ) ). } • Find all samples having an attribute of type temperature, with a numerical value and a unit specified. Hint: use sio:SIO_000221 (has unit), sio:SIO_000300 (has value) • Find samples/groups annotated with intervals, which use the properties biosd-terms:has-low- value and has-high-value and optionally have a unit. Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql Excercise Solutions: see examples on that page
  • 14. Expressed Genes and Samples • For http://purl.uniprot.org/uniprot/P04637 (P53 in Human) • Find the EFO classes for which it is up-regulated in the Atlas (p-value < 1E-9) • And show the atlas expression value label . Hints: – Start from the example http://tinyurl.com/kvvhw6b, – Use the Atlas endpoint: http://www.ebi.ac.uk/rdf/services/atlas/sparql • Find the samples having attributes that are instances of such EFO classes • Which comes from a repository other than 'ArrayExpress' • Hints: – Use SERVICE <http://www.ebi.ac.uk/rdf/services/biosamples/sparql> and a sub-query – Search property values linked to prop. types that are instances of the e.f. found by the Atlas – Then link to the samples, the samples to the submissions, the submissions to the web records ● OR JUST HAVE A LOOK: http://tinyurl.com/ln3m7nv (will take a while...)
  • 15. Ideas for the Hackaton • Refer to http://tinyurl.com/mo7wgye for details • From geo-located samples (samples annotated with latitude/longitude) to Google maps, e.g, by using Exhibit (http://www.simile-widgets.org/exhibit/) • Take similar datasets (e.g., MAASTRO, Breast Cancer Data, your data), unify the schemas (e.g., using CONSTRUCT), define federated queries • Use the Shape or OpenPHACTS validator to define sensible rules for BioSD and similar data-sets, e.g., must contain an organism, should have a treatment • Design/build an App (or Web widget) that asks for eligibility criterion, i.e., pairs of attribute value/type, and translate it into a SPARQL query (or a more complex search based on SPARQL) to find samples – Use common ontologies for auto-completion over property types – Use string-based auto-completion for values – Consider numerical values, intervals, units – Do approximate matching, i.e., matching 8/10 of specified pairs is good.
  • 16. Acknowledgements • BioSD Team - Alvis Brazma, Tony Burdett, Adam Faulconbridge, Mike Gostev, Helen Parkinson, Rui Perreria, Ugis Sarkans, Drashtti Vasant • Tony Burdett for the help with Zooma • Simon Jupp, Andy Jenkinson, James Malone, for their great help with developing and setting up BioSD/RDF – The rest of the Linked Data team @EBI (http://www.ebi.ac.uk/rdf) • BiomedBridges FP7 project (http://www.biomedbridges.eu), for funding us
  • 17. And you all! Sorry, we have 2.7M samples, but not all of them... (Source: http://en.wikipedia.org/wiki/File:Assorted_computer_mice_-_MfK_Bern.jpg) Contact info: www.ebi.ac.uk/biosamples www.marcobrandizi.info
  • 19. • biosd-terms (http://tiny.cc/biosd_terms) – a small application ontology defining specific classes and properties, e.g., sample, sample group, has-knowledgeable-person • Experimental Factors Ontology (EFO) – mainly to define/annotate sample attributes • Ontology for Biomedical Investigations (OBI) • Information Artefacts Ontology (IAO) • Semantic Science Ontology (SIO) – to define main classes in BioSD/RDF • Bibliographic Ontology (BIBO) – We link publications about submissions/sample sets • Dublin Core, schema.org, FOAF – for general categories and in the Linked Data spirit • Linked automatically by Zooma: many more (e.g., CHEBI, NCBI-Tax, GO) Main Ontologies used in BioSD / Linked Data

Editor's Notes

  • #5: &amp;lt;number&amp;gt;
  • #6: &amp;lt;number&amp;gt;
  • #7: &amp;lt;number&amp;gt;
  • #8: &amp;lt;number&amp;gt;
  • #9: &amp;lt;number&amp;gt;
  • #10: &amp;lt;number&amp;gt;
  • #21: &amp;lt;number&amp;gt;