LINKED DATA 
A PERSONAL PERSPECTIVE 
Janifer Gatenby 
OCLC EMEA 
With acknowledgements to Richard Wallis and Anila Angjeli
LINKED DATA 
• What is it? 
• What does it promise? 
• How do we get there? 
• What happens when we get there?
WHAT IS IT?
What is it? 
A WAY OF EXPRESSING A LINK 
• Not really a new way of linking but a new way of expressing a link 
It is about using canonical trusted globally 
referenceable identifiers for concepts, people, 
organisations, locations etc. instead of copying text 
strings and losing the connection with the 
authoritative sources they came from. 
Richard Wallis
What is it? 
MARC21 LINKS 
• 700 10 $a name $e role $0 authority control number 
– (added entry in a MARC record for a name related to a work, not the main 
author) 
These familiar links reference an authority record in the 
same database as a bibliographic record, hence have 
no address portion. Linked data extends the linking 
range.
What is it? 
EXTENDING THE LINKING RANGE: URI 
• URI – immutable address as well as an identifier 
– http://id.loc.gov/authorities/names/nr89009099 
– http://viaf.org/viaf /116774723 
– http://isni.org/isni/000000114556841 
9 NACO libraries – 
LC, 
National Agricultural Library, 
National Library of Medicine, 
British Library, 
NL Mexico, 
NLNZ, 
NL Scotland, 
NL South Africa, 
NL Wales
What is it? 
EXTENDING THE LINKING RANGE: RDF 
• RDF – metadata is expressed in triples 
– Data 
– Data label (properties) 
– Vocabulary from which the label comes (gives context to the label)
What is it? 
SPARQL 
• A database can offer a SPARQL endpoint = can receive RDF queries 
– Author [schema] Name [data label] De Groot, Gerard J., 1955 [data] 
• “SPARQL allows users to write queries against data that can loosely be called "key-value" data, 
more specifically it is data that follows the RDF specification of the W3C. The entire database is 
thus a set of "subject-predicate-object" triples.” 
• 1.1 Stable release 2013-03-21 
– W3C recommendation 
http://en.wikipedia.org/wiki/SPARQL 
http://www.w3.org/blog/SW/2008/01/15/sparql 
_is_a_recommendation/
What is it? 
LINKED DATA PRINCIPLES 
1. Use URIs as names for things 
2. Use HTTP URIs so people can look up those names 
3. When someone looks up a URI, provide useful information, using the standards - RDF 
4. Include links to other URIs, so that they can discover more 
Tim Berners-Lee - 2006
What is it? 
VOCABULARIES 
• Vocabularies are not schemas, they are lists of defined data labels (concepts) 
– Schema.org (Search engines) 
– BibFrame (Library community) 
– FOAF Friend of a friend 
– OWL same as 
• Vocabularies can be mixed foaf:name "Jimmy Wales" ; 
foaf:mbox <mailto:jwales@bomis.com> ; 
foaf:homepage <http://www.jimmywales.com/> ; 
foaf:nick "Jimbo" ;
WHAT DOES IT PROMISE?
What does it promise? 
• Enriched displays without data maintenance 
• Better harvesting and ranking 
• because of markup 
• and because of links 
• Navigation to pages with additional information – 
– Example: from VIAF via ISNI to encyclopaedias, rights management societies (digitisation 
rights), Bowker – biographies from fly leaves
What does it promise? 
INTERCONNECTING FRENCH CULTURAL HERITAGE TREASURES ON 
THE WEB 
Digital documents 
(DC) 
BnF Main catalogue 
(MARC) 
Web pages for 
Other BnF 
resources 
BnF Archives and Internet users 
Manuscripts 
catalogue 
(EAD) Raw data for machines 
Modeling 
Matching 
Clustering 
Alignments 
Semantic Web 
techniques 
External 
resources
What does it promise? 
BnF persistent ID 
Imported 
from 
Wikipedia 
and 
integrated in 
the page
What does it promise? 
Data can be downloaded 
Existing ones + others 
defined for the specific 
needs of the project 
Information about the data model (or ontology) at : http://data.bnf.fr/about-en
What does it promise? 
BIG DATA AS RDF 
• Data is re-usable without a full blown conversion 
• Permits 3rd party analysis of big data sets 
• Data mining for new information
HOW DO WE GET THERE?
How do we get there? 
MAKING THE LINKS 
DNB CultureGraph 
– “It’s all about creating 
connections” 
– DDC to RVK (German 
classification) by 
comparing search 
results 
– GND (names) to 
German Wikipedia
How do we get there? 
EXAMPLE: VIAF 
• Ingesting data to compare and create links 
• Makes clusters; cluster identifier 
• Ingesting preferred to external linking 
– Wikipedia, ISNI, WorldCat identities 
– More data used for clustering, so more reliable 
• VIAFBot for making reciprocal links in Wikipedia / Wikidata 
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/> 
<rdf:typedf:resource="http://rdvocab.info/uri/schema/FRBRentitiesRDA/Person"/> 
<foaf:name>De Groot, Gerard J., 1955-</foaf:name> 
<foaf:name>DeGroot, Gerard J., 1955-</foaf:name> 
<rdaGr2:dateOfBirth>1955-06-22</rdaGr2:dateOfBirth> 
<owl:sameAs rdf:resource="http://data.bnf.fr/ark:/12148/cb12299846b#foaf:Person"/> 
<owl:sameAs rdf:resource="http://www.idref.fr/034977651/id"/> 
<owl:sameAs rdf:resource="http://d-nb.info/gnd/12422900X"/>
cross-domain bridging-domains 
Libraries 
Text Rights 
Trade Sources Music Rights 
Encyclopaedias 
Researchers & Professional 
Granting organisations 
Professional Societies 
Article databases 
Theses databases 
Archives and 
Museums
How do we get there? 
EXAMPLE: ISNI: 15 MILLION LINKS 
Linked Data: isni.org/isni/
How do we get there? 
LA TROBE UNIVERSITY LINKS: 3,427
How do we get there? 
LA TROBE UNIVERSITY: 1,864 VIAF LINKS
How do we get there? 
ISNI – A LINKING IDENTIFIER 
• Identifiers Seal Uniqueness: “n” number 
of other elements are necessary for 
uniqueness 
• Stable identifier; stable metadata: 
• assigned where there is confidence in 
the quality and completeness of the 
metadata to establish uniqueness 
• ISNI system + Quality Team (BL & BnF) 
Linking erroneous data 
propagates errors.
How do we get there? 
LINKS ARE MADE ONCE – THEN INHERITED 
• URI – immutable address as well as an identifier 
– http://id.loc.gov/authorities/names/nr89009099 
– http://viaf.org/viaf /116774723 
– http://isni-url.oclc.nl/isni/000000114556841 
9 NACO libraries – 
Library of Congress, 
National Agricultural Library, 
National Library of Medicine, 
British Library, 
NL Mexico, 
NLNZ, 
NL Scotland, 
NL South Africa, 
NL Wales
WHAT HAPPENS WHEN WE GET THERE?
What happens when we get there? 
HOW DOES SEARCHING WORK? 
• Search happens mostly in the search engines 
• Library catalogue concentrates on: 
– Being linked to 
– Linking out (navigation) 
– Delivery, particularly of the digitised and immediate
What happens when we get there? 
HOW DO SEARCH AND LINKED DATA INTERACT? 
• Is search really fully delegated to search 
engines & larger union catalogues?
What happens when we get there? 
SEARCH TYPES 
Search type Happening in 
Known item Search engines, also in more specific 
sources where noise is a problem 
Subject search Search engines, also in more specific 
sources, to reduce noise and benefit from 
more precise searching capabilities 
Index browse In catalogues 
Follow a link Everywhere . In library catalogues from a 
full record display. 
The more your catalogue is linked in, the more likely it is 
to attract all types of searches
What happens when we get there? 
STORE ONLY THE LINKS? 
• Data needed 
• For making indexes 
• For comparisons, 
e.g. For de-duplication 
• Data mining 
It is about using canonical trusted globally 
referenceable identifiers for concepts, people, 
organisations, locations etc. instead of copying text 
strings and losing the connection with the 
authoritative sources they came from. 
This doesn’t mean that you only 
need the links; you often also 
need to ingest the data 
Besides data storage no longer the constraint it once was
READ FURTHER 
• http://www.slideshare.net/tulipbiru64/the-single-power-of-link-richard-wallis 
• http://www.slideshare.net/rjw/linked-data-and-oclc

It19 20140721 linked data personal perspective

  • 1.
    LINKED DATA APERSONAL PERSPECTIVE Janifer Gatenby OCLC EMEA With acknowledgements to Richard Wallis and Anila Angjeli
  • 2.
    LINKED DATA •What is it? • What does it promise? • How do we get there? • What happens when we get there?
  • 3.
  • 4.
    What is it? A WAY OF EXPRESSING A LINK • Not really a new way of linking but a new way of expressing a link It is about using canonical trusted globally referenceable identifiers for concepts, people, organisations, locations etc. instead of copying text strings and losing the connection with the authoritative sources they came from. Richard Wallis
  • 5.
    What is it? MARC21 LINKS • 700 10 $a name $e role $0 authority control number – (added entry in a MARC record for a name related to a work, not the main author) These familiar links reference an authority record in the same database as a bibliographic record, hence have no address portion. Linked data extends the linking range.
  • 6.
    What is it? EXTENDING THE LINKING RANGE: URI • URI – immutable address as well as an identifier – http://id.loc.gov/authorities/names/nr89009099 – http://viaf.org/viaf /116774723 – http://isni.org/isni/000000114556841 9 NACO libraries – LC, National Agricultural Library, National Library of Medicine, British Library, NL Mexico, NLNZ, NL Scotland, NL South Africa, NL Wales
  • 7.
    What is it? EXTENDING THE LINKING RANGE: RDF • RDF – metadata is expressed in triples – Data – Data label (properties) – Vocabulary from which the label comes (gives context to the label)
  • 8.
    What is it? SPARQL • A database can offer a SPARQL endpoint = can receive RDF queries – Author [schema] Name [data label] De Groot, Gerard J., 1955 [data] • “SPARQL allows users to write queries against data that can loosely be called "key-value" data, more specifically it is data that follows the RDF specification of the W3C. The entire database is thus a set of "subject-predicate-object" triples.” • 1.1 Stable release 2013-03-21 – W3C recommendation http://en.wikipedia.org/wiki/SPARQL http://www.w3.org/blog/SW/2008/01/15/sparql _is_a_recommendation/
  • 9.
    What is it? LINKED DATA PRINCIPLES 1. Use URIs as names for things 2. Use HTTP URIs so people can look up those names 3. When someone looks up a URI, provide useful information, using the standards - RDF 4. Include links to other URIs, so that they can discover more Tim Berners-Lee - 2006
  • 10.
    What is it? VOCABULARIES • Vocabularies are not schemas, they are lists of defined data labels (concepts) – Schema.org (Search engines) – BibFrame (Library community) – FOAF Friend of a friend – OWL same as • Vocabularies can be mixed foaf:name "Jimmy Wales" ; foaf:mbox <mailto:[email protected]> ; foaf:homepage <http://www.jimmywales.com/> ; foaf:nick "Jimbo" ;
  • 11.
    WHAT DOES ITPROMISE?
  • 12.
    What does itpromise? • Enriched displays without data maintenance • Better harvesting and ranking • because of markup • and because of links • Navigation to pages with additional information – – Example: from VIAF via ISNI to encyclopaedias, rights management societies (digitisation rights), Bowker – biographies from fly leaves
  • 15.
    What does itpromise? INTERCONNECTING FRENCH CULTURAL HERITAGE TREASURES ON THE WEB Digital documents (DC) BnF Main catalogue (MARC) Web pages for Other BnF resources BnF Archives and Internet users Manuscripts catalogue (EAD) Raw data for machines Modeling Matching Clustering Alignments Semantic Web techniques External resources
  • 16.
    What does itpromise? BnF persistent ID Imported from Wikipedia and integrated in the page
  • 17.
    What does itpromise? Data can be downloaded Existing ones + others defined for the specific needs of the project Information about the data model (or ontology) at : http://data.bnf.fr/about-en
  • 18.
    What does itpromise? BIG DATA AS RDF • Data is re-usable without a full blown conversion • Permits 3rd party analysis of big data sets • Data mining for new information
  • 19.
    HOW DO WEGET THERE?
  • 20.
    How do weget there? MAKING THE LINKS DNB CultureGraph – “It’s all about creating connections” – DDC to RVK (German classification) by comparing search results – GND (names) to German Wikipedia
  • 21.
    How do weget there? EXAMPLE: VIAF • Ingesting data to compare and create links • Makes clusters; cluster identifier • Ingesting preferred to external linking – Wikipedia, ISNI, WorldCat identities – More data used for clustering, so more reliable • VIAFBot for making reciprocal links in Wikipedia / Wikidata <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/> <rdf:typedf:resource="http://rdvocab.info/uri/schema/FRBRentitiesRDA/Person"/> <foaf:name>De Groot, Gerard J., 1955-</foaf:name> <foaf:name>DeGroot, Gerard J., 1955-</foaf:name> <rdaGr2:dateOfBirth>1955-06-22</rdaGr2:dateOfBirth> <owl:sameAs rdf:resource="http://data.bnf.fr/ark:/12148/cb12299846b#foaf:Person"/> <owl:sameAs rdf:resource="http://www.idref.fr/034977651/id"/> <owl:sameAs rdf:resource="http://d-nb.info/gnd/12422900X"/>
  • 22.
    cross-domain bridging-domains Libraries Text Rights Trade Sources Music Rights Encyclopaedias Researchers & Professional Granting organisations Professional Societies Article databases Theses databases Archives and Museums
  • 23.
    How do weget there? EXAMPLE: ISNI: 15 MILLION LINKS Linked Data: isni.org/isni/
  • 24.
    How do weget there? LA TROBE UNIVERSITY LINKS: 3,427
  • 25.
    How do weget there? LA TROBE UNIVERSITY: 1,864 VIAF LINKS
  • 26.
    How do weget there? ISNI – A LINKING IDENTIFIER • Identifiers Seal Uniqueness: “n” number of other elements are necessary for uniqueness • Stable identifier; stable metadata: • assigned where there is confidence in the quality and completeness of the metadata to establish uniqueness • ISNI system + Quality Team (BL & BnF) Linking erroneous data propagates errors.
  • 27.
    How do weget there? LINKS ARE MADE ONCE – THEN INHERITED • URI – immutable address as well as an identifier – http://id.loc.gov/authorities/names/nr89009099 – http://viaf.org/viaf /116774723 – http://isni-url.oclc.nl/isni/000000114556841 9 NACO libraries – Library of Congress, National Agricultural Library, National Library of Medicine, British Library, NL Mexico, NLNZ, NL Scotland, NL South Africa, NL Wales
  • 28.
    WHAT HAPPENS WHENWE GET THERE?
  • 29.
    What happens whenwe get there? HOW DOES SEARCHING WORK? • Search happens mostly in the search engines • Library catalogue concentrates on: – Being linked to – Linking out (navigation) – Delivery, particularly of the digitised and immediate
  • 30.
    What happens whenwe get there? HOW DO SEARCH AND LINKED DATA INTERACT? • Is search really fully delegated to search engines & larger union catalogues?
  • 31.
    What happens whenwe get there? SEARCH TYPES Search type Happening in Known item Search engines, also in more specific sources where noise is a problem Subject search Search engines, also in more specific sources, to reduce noise and benefit from more precise searching capabilities Index browse In catalogues Follow a link Everywhere . In library catalogues from a full record display. The more your catalogue is linked in, the more likely it is to attract all types of searches
  • 32.
    What happens whenwe get there? STORE ONLY THE LINKS? • Data needed • For making indexes • For comparisons, e.g. For de-duplication • Data mining It is about using canonical trusted globally referenceable identifiers for concepts, people, organisations, locations etc. instead of copying text strings and losing the connection with the authoritative sources they came from. This doesn’t mean that you only need the links; you often also need to ingest the data Besides data storage no longer the constraint it once was
  • 33.
    READ FURTHER •http://www.slideshare.net/tulipbiru64/the-single-power-of-link-richard-wallis • http://www.slideshare.net/rjw/linked-data-and-oclc

Editor's Notes

  • #23 Multiple domains 25 contributors – more Zetoc & OCLCT to load. Also 10 contributors in loading queue; more expected Members of the ISNI International Agency (ISNI-IA) are highlighted.