Linked (Open) Data:
A quick introduction
Nicola Vitucci
nicola.vitucci@gmail.com
Intro
● The WWW is on the way to become an
immense database (Web of Data)
● What does this mean?
– What is it made of?
– What is it for?
– Whom is it for?
LOD cloud
http://lod-cloud.net/versions/2014-08-30/lod-cloud_colored.png
LOD cloud
http://lod-cloud.net/versions/2014-08-30/lod-cloud_colored.png
5-star data
http://5stardata.info
LD principles
● Original design rules
– Use URIs as unique identifiers for resources (not the
same as URL)
– Use the HTTP URI scheme (rather than other
schemes such as URN), so that URL = URI
– When an ID is dereferenced (= looked up), give
useful information using the standards (e.g. RDF)
– Provide links to other resources
● LOD = LD + open license
RDF
● RDF (Resource Description Framework) is
a fundamental brick to build LD
● It is built on the concept of triple: a subject
linked to an object by means of a predicate
ns2:Ingredient 1
ns2:Ingredient 2
ns2:Product1
ns:product
ns:product
10
20
ns:weight
ns:weight
ns = http://www.example.com/ ns2 = http://www.anotherexample.com/
RDF: data schema
● In a relational database, we have to look
for definitions in the data schema
● Using RDF, instead, we can fully describe
data and their schema!
● In order to do this, we need vocabularies
– Every term in a vocabulary has a
common base URI called namespace
Common vocabularies
● rdf, rdfs, owl – RDF “core” vocabularies
● dcterms – general properties for resources
● foaf – Friend of a Friend
● geo – geolocalization
● skos – description of schemas and taxonomies
● void, dcat – description of datasets
● doap – description of projects
Using LD
● Should we know all the details about RDF
to be able to use LD?
● “Follow your nose” approach thanks to links
– https://www.wikidata.org
– http://sameas.org
– https://datahub.io
Using LD
Search Tartu on Wikidata
more links to visit!
LD and business
● LD are still somewhat a niche
● Successfully used in some fields, e.g. life
sciences
● Still experimental in other domains, e.g.
corporate, government, and finance
● Do Linked Data have to be Open?
– Not necessarily
LD and business
● Life sciences domain: OpenPHACTS
● “Developed to reduce barriers to drug
discovery in industry, academia and for small
businesses”
● The partners are universities as well as
companies
● It is built on top of existing well-known public
data sources
LD and business
https://www.openphacts.org/2/sci/data.html
LD and business
● Corporate domain: OpenCorporates
● “[...] our primary goal of making company
information more widely available for the
public benefit [...]”
● LD is not (yet) a core part, but they added
some RDF representations and stable,
dereferenceable URIs such as
https://opencorporates.com/id/companies/us_
wa/600413485
LD and business
<rdf:Description rdf:about="https://opencorporates.com/id/companies/us_wa/600413485">
<rdfs:label>MICROSOFT CORPORATION</rdfs:label>
<rdf:type rdf:resource="http://s.opencalais.com/1/type/er/Company"/>
<rdf:type rdf:resource="http://www.w3.org/ns/regorg#RegisteredOrganization"/>
<opencorporates:legalName>MICROSOFT CORPORATION</opencorporates:legalName>
<rov:legalName>MICROSOFT CORPORATION</rov:legalName>
<opencorporates:companyType>Regular Corporation - Profit</opencorporates:companyType>
<rov:orgType>Regular Corporation - Profit</rov:orgType>
<opencorporates:companyStatus>Active</opencorporates:companyStatus>
<rov:orgStatus>Active</rov:orgStatus>
<rov:registration
rdf:resource="https://opencorporates.com/id/companies/us_wa/600413485#id"/>
</rdf:Description>
Advantages
● Easier interlinking of heterogeneous data
● Easier creation and maintenance of data
schemas
● Distributed “by default”
● Controlled definition of shared knowledge
Challenges
● Rather new topic
– Needs skill and experience
● As data size increases, performance may worsen
– However, this depends on the use case
● Extra care is necessary when using distributed data
sources
– Accessibility & availability issues
– Data quality
Questions
● We have just scratched the surface
– How to get the most out of LD?
– How to build LD?
– How to maintain LD?
– What is SPARQL? How to use it?
– What is an ontology?
Open Data Day 2017
● “An international celebration of Open Data”
● We’ll have one in Tartu!
– Location: here
– Time: 11:00-16:00
– Facebook: “Open Data Day / Avaandmete päev 2017”
– Web:
● https://okestonia.github.io/opendataday
● https://github.com/okestonia/opendataday
– UT Institute of CS is one of the partners
Thank you!
Aitäh!
Grazie!

Linked (Open) Data: A quick introduction

  • 1.
    Linked (Open) Data: Aquick introduction Nicola Vitucci [email protected]
  • 2.
    Intro ● The WWWis on the way to become an immense database (Web of Data) ● What does this mean? – What is it made of? – What is it for? – Whom is it for?
  • 3.
  • 4.
  • 5.
  • 6.
    LD principles ● Originaldesign rules – Use URIs as unique identifiers for resources (not the same as URL) – Use the HTTP URI scheme (rather than other schemes such as URN), so that URL = URI – When an ID is dereferenced (= looked up), give useful information using the standards (e.g. RDF) – Provide links to other resources ● LOD = LD + open license
  • 7.
    RDF ● RDF (ResourceDescription Framework) is a fundamental brick to build LD ● It is built on the concept of triple: a subject linked to an object by means of a predicate ns2:Ingredient 1 ns2:Ingredient 2 ns2:Product1 ns:product ns:product 10 20 ns:weight ns:weight ns = http://www.example.com/ ns2 = http://www.anotherexample.com/
  • 8.
    RDF: data schema ●In a relational database, we have to look for definitions in the data schema ● Using RDF, instead, we can fully describe data and their schema! ● In order to do this, we need vocabularies – Every term in a vocabulary has a common base URI called namespace
  • 9.
    Common vocabularies ● rdf,rdfs, owl – RDF “core” vocabularies ● dcterms – general properties for resources ● foaf – Friend of a Friend ● geo – geolocalization ● skos – description of schemas and taxonomies ● void, dcat – description of datasets ● doap – description of projects
  • 10.
    Using LD ● Shouldwe know all the details about RDF to be able to use LD? ● “Follow your nose” approach thanks to links – https://www.wikidata.org – http://sameas.org – https://datahub.io
  • 11.
    Using LD Search Tartuon Wikidata more links to visit!
  • 12.
    LD and business ●LD are still somewhat a niche ● Successfully used in some fields, e.g. life sciences ● Still experimental in other domains, e.g. corporate, government, and finance ● Do Linked Data have to be Open? – Not necessarily
  • 13.
    LD and business ●Life sciences domain: OpenPHACTS ● “Developed to reduce barriers to drug discovery in industry, academia and for small businesses” ● The partners are universities as well as companies ● It is built on top of existing well-known public data sources
  • 14.
  • 15.
    LD and business ●Corporate domain: OpenCorporates ● “[...] our primary goal of making company information more widely available for the public benefit [...]” ● LD is not (yet) a core part, but they added some RDF representations and stable, dereferenceable URIs such as https://opencorporates.com/id/companies/us_ wa/600413485
  • 16.
    LD and business <rdf:Descriptionrdf:about="https://opencorporates.com/id/companies/us_wa/600413485"> <rdfs:label>MICROSOFT CORPORATION</rdfs:label> <rdf:type rdf:resource="http://s.opencalais.com/1/type/er/Company"/> <rdf:type rdf:resource="http://www.w3.org/ns/regorg#RegisteredOrganization"/> <opencorporates:legalName>MICROSOFT CORPORATION</opencorporates:legalName> <rov:legalName>MICROSOFT CORPORATION</rov:legalName> <opencorporates:companyType>Regular Corporation - Profit</opencorporates:companyType> <rov:orgType>Regular Corporation - Profit</rov:orgType> <opencorporates:companyStatus>Active</opencorporates:companyStatus> <rov:orgStatus>Active</rov:orgStatus> <rov:registration rdf:resource="https://opencorporates.com/id/companies/us_wa/600413485#id"/> </rdf:Description>
  • 17.
    Advantages ● Easier interlinkingof heterogeneous data ● Easier creation and maintenance of data schemas ● Distributed “by default” ● Controlled definition of shared knowledge
  • 18.
    Challenges ● Rather newtopic – Needs skill and experience ● As data size increases, performance may worsen – However, this depends on the use case ● Extra care is necessary when using distributed data sources – Accessibility & availability issues – Data quality
  • 19.
    Questions ● We havejust scratched the surface – How to get the most out of LD? – How to build LD? – How to maintain LD? – What is SPARQL? How to use it? – What is an ontology?
  • 20.
    Open Data Day2017 ● “An international celebration of Open Data” ● We’ll have one in Tartu! – Location: here – Time: 11:00-16:00 – Facebook: “Open Data Day / Avaandmete päev 2017” – Web: ● https://okestonia.github.io/opendataday ● https://github.com/okestonia/opendataday – UT Institute of CS is one of the partners
  • 21.