Metadata is back!
Bernhard Haslhofer - Cornell University

JCDL 2011 - Semantic Web Technologies for Libraries and Readers Workshop
Ottawa, Canada
Thursday, June 16th 2011
schema.org Book Example
  <img src="catcher-in-the-rye-book-cover.jpg" />
  The Catcher in the Rye - Mass Market Paperback
  by <a href="/author/jd_salinger.html">J.D. Salinger</a>

  Price: $6.99
  In Stock

  Product details
  224 pages
  Publisher: Little, Brown, and Company - May 1, 1991
  Language: English
  ISBN-10: 0316769487
schema.org Book Example
<div itemscope itemtype="http://schema.org/Book">

 <img itemprop="image" src="catcher-in-the-rye-book-cover.jpg"/>
 <span itemprop="name">The Catcher in the Rye</span> -
  <link itemprop="bookFormat" href="http://schema.org/
 Paperback">Mass Market Paperback by <a itemprop="author" href="/
 author/jd_salinger.html">J.D. Salinger</a>

  <div itemprop="offers" itemscope itemtype="http://schema.org/
  Offer">
  Price: <span itemprop="price">$6.99</span>
  <meta itemprop="priceCurrency" content="USD" />
  <link itemprop="availability" href="http://schema.org/InStock">In
Stock
  <link itemprop=”url” href=”http://en.wikipedia.org/wiki/
  The_Catcher_in_the_Rye”>
  </div>

 ...

</div>
The story so far...
Library Catalogue
                                                                       Controlled
                                                                       Vocabulary


                                       (c) Vienna University Library




                                    Metadata
(c) Bill Steele/Cornell Chronicle




                                                                       Identifier
                                       (c) Vienna University Library
OPAC




Metadata


                       Controlled
                       Vocabulary

           Identifier
WWW / Wikipedia / Search Engines




Identifier?
Metadata?
Controlled Vocabulary?
getMetadata(Web): void
Semantic Web - Early Vision
               "Mom needs to see a specialist and then has to
                  have a series of physical therapy sessions.
                Biweekly or something. I'm going to have my
                      agent set up the appointments."


               “The Semantic Web will bring structure to the
                meaningful content of Web pages, creating an
                environment where software agents roaming
                  from page to page can readily carry out
                       sophisticated tasks for users”


                “For the semantic web to function, computers
                 must have access to structured collections of
                  information and sets of inference rules that
               they can use to conduct automated reasoning.”


~2000                                                            2011
Semantic Web Technologies
                   User Interface & Applications

                                                          Trust

                                           Proof

                               Unifying Logic

                              Ontology:
                                             Rules: RIF
                   Query:       OWL
                  SPARQL
                                       RDF-S                      Crypto


                  Data Model: RDF

                        XML

            URI                       Unicode




~2000                                                                      2011
RDFa & Microformats


 • Mechanisms to embed structured metadata in Web
   pages
 • Define and/or reuse (X)HTML attributes to augment
   information in Websites with machine-readable
   semantics




~2000                                               2011
RDFa Example
  <div xmlns="http://www.w3.org/1999/xhtml"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
       xmlns:v="http://www.w3.org/2006/vcard/ns#">

     <div about="http://example.com/me/behas" typeof="v:VCard">
       <span property="v:fn">Bernhard Haslhofer</span>
       <span property="v:nickname">behas</span>
       <div rel="v:adr">
            <div typeof="v:Address v:Work">
                <span property="v:street-address">301 College Avenue</span>
                <span property="v:locality">Ithaca</span>,
                <span property="v:postal-code">14850</span>,
                <span property="v:country-name">United States</span>.
            </div>
         </div>
         <a rel="v:email"
  href="mailto:bernhard.haslhofer@cornell.edu">bernhard.haslhofer@cornell.edu</a>.
     </div>
  </div>




~2000                                                                            2011
Microformats Example
   <div class="vcard">

        <span class="fn">Bernhard Haslhofer</span>

        <div class="adr">
          <div class="street-address">301 College Avenue</div>
          <span class="locality">Ithaca</span>
          <span class="postal-code">14850</span>
          <span class="country-name">United States</span>
        </div>

        <a class="email"
        href="mailto:bernhard.haslhofer@cornell.edu">bernhard.haslhofer@cornell.edu</a>

   </div>




~2000                                                                                     2011
Linked Data

 • There is lots of information on the Web
 • ... valuable information that can be (re-)used
 • Problem
   • information is usually expressed in the form of HTML
        documents
   •    the underlying raw data are locked in closed data silos (mostly
        DBMS)



~2000                                                               2011
Why Linked Data?

 • The Web is successful because it provides
   • Uniform encoding (HTML)
   • Uniform addressing (URI)
   • Uniform transportation (HTTP)
   for the exchange of documents.
 • Why not apply the same mechanism to the underlying
   data?


~2000                                              2011
What is Linked Data?
        • A pragmatic method to build a Web of Data
        • Architectural style based on SW standards
        • Intelligent agents not primary focus


                             Web




~2000                                                 2011
Publishing Data

 • Distinguish between non-information and information
   resource
 • Sample non-information resource
   • http://dbpedia.org/resource/The_Catcher_in_the_Rye
 • Sample information resource
   • http://dbpedia.org/page/The_Catcher_in_the_Rye - HTML
   • http://dbpedia.org/data/The_Catcher_in_the_Rye - RDF


~2000                                                        2011
Retrieving Linked Data




~2000                     2011
Microdata (HTML5)

 • A very young HTML 5 proposition that extends
   Microformats and addresses its shortcomings
 • Items are created within an itemscope
 • Ever item is assigned an arbitrary number of
   properties (itemprop)
 • Uses global identifiers for typing and naming items

~2000                                                   2011
Microdata Example

<div itemscope itemtype="http://data-vocabulary.org/Person">

  <span itemprop="name">Bernhard Haslhofer</span>,
  <span itemprop="nickname">behas</span>.

  <div itemprop="address" itemscope itemtype="http://data-vocabulary.org/Address">
    <span itemprop="street-address">301 College Avenue</span>
    <span itemprop="locality">Ithaca</span>
    <span itemprop="country-name">United States</span>
  </div>

</div>




~2000                                                                         2011
Google Rich Snippets / SEO




~2000                         2011
Facebook




~2000       2011
Facebook




~2000       2011
schema.org




~2000         2011
technical /
 conceptual
 complexity




                      User Interface & Applications

                                                             Trust

                                              Proof

                                  Unifying Logic

                                 Ontology:
                                                Rules: RIF
                      Query:       OWL
                     SPARQL
                                          RDF-S                      Crypto


                     Data Model: RDF

                           XML

               URI                       Unicode




                                                   RDFa


                                                                              Microdata
                                 Microformats


~2000                                                                                     2011
Where are we now?
(c) http://wiki.bib.uni-mannheim.de/dc-provenance/lib/exe/detail.php?id=europeana_example&media=europeana-ore.png
What next?
Deal with with schema.org

• Ignore it?
• Adopt it?
• Align existing library models with schema.org?
• Schema.org provides an extension mechanism for
 • properties
 • classes
Data Quality / Resource Sync

• The Web is not static
• Resources and their representations might change or
 disappear over time
• Make sure that
 • applications can synchronize resources and learn about
     changes
 •   go back in time
Use Web Data in Apps


• Aggregate Web resources into special collections
• DBpedia provides resource descriptions translated
 into 90+ languages!!!
• Use URIs instead of labels for tagging
• Combine and mesh up data
• Analyze data ...
Summary
Metadata is back
• Metadata was introduced in the 19th century to deal
 with the information overload
• Cataloguing rules and workflows evolved over time
• The Web seemed to work pretty well without
 metadata (info retrieval, nat.lang processing)
• Now we have strong indicators that structured
 metadata on the Web will play an important role in
 future
• Shouldn’t libraries / librarians be part of that?
References



•   Coyle, K.: Library Data in a Modern Context. In: Understanding the
    Semantic Web: Bibliographic Data and Metadata. Library Technology
    Reports. January 2010

•   http://blog.mediaspaces.info/ (Linked Data in Libraries State-of-the-Art)
BACKUP
Metadata Building Blocks

                                   class                     relationship
       Schema Definition
          Language
                                               property




      Metadata Schema
                               Title            Author         Genre




                           Title           The Catcher in the Rye


         Metadata          Author          Salinger, J.D.


                           Genre           Fiction                          (Digital / Non-Digital)
                                                                             Information Object
Google Rich Snippet Types

 • Reviews
 • People
 • Products
 • Businesses and organizations
 • Recipes
 • Events

~2000                             2011
http://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-across-the-web/
Facebook




~2000       2011
Microformats                                RDFa
   flat namespace                            XML namespaces

   support HTML4, XHTML 1.1, and
                                            support for XHTML 1.1
   HTML 5

   use latent HTML attributes               introduces new metadata attributes

   vocabulary defined by one
                                            open to any RDF-based vocabulary
   organization/community

cp.: http://evan.prodromou.name/RDFa_vs_microformats



~2000                                                                            2011
Publishing Data
          GET http://dbpedia.org/resource/The_Catcher_in_the_Rye
          Accept: application/rdf+xml



          303 See Other
          Location: http://dbpedia.org/data/The_Catcher_in_the_Rye



          GET http://dbpedia.org/data/The_Catcher_in_the_Rye
          Accept: application/rdf+xml



          200 OK
          ...
          <?xml version="1.0" encoding="utf-8"?>
          <rdf:RDF ...




~2000                                                                2011
Metadata is back!

Metadata is back!

  • 1.
    Metadata is back! BernhardHaslhofer - Cornell University JCDL 2011 - Semantic Web Technologies for Libraries and Readers Workshop Ottawa, Canada Thursday, June 16th 2011
  • 4.
    schema.org Book Example <img src="catcher-in-the-rye-book-cover.jpg" /> The Catcher in the Rye - Mass Market Paperback by <a href="/author/jd_salinger.html">J.D. Salinger</a> Price: $6.99 In Stock Product details 224 pages Publisher: Little, Brown, and Company - May 1, 1991 Language: English ISBN-10: 0316769487
  • 5.
    schema.org Book Example <divitemscope itemtype="http://schema.org/Book"> <img itemprop="image" src="catcher-in-the-rye-book-cover.jpg"/> <span itemprop="name">The Catcher in the Rye</span> -  <link itemprop="bookFormat" href="http://schema.org/ Paperback">Mass Market Paperback by <a itemprop="author" href="/ author/jd_salinger.html">J.D. Salinger</a> <div itemprop="offers" itemscope itemtype="http://schema.org/ Offer">   Price: <span itemprop="price">$6.99</span>   <meta itemprop="priceCurrency" content="USD" />   <link itemprop="availability" href="http://schema.org/InStock">In Stock <link itemprop=”url” href=”http://en.wikipedia.org/wiki/ The_Catcher_in_the_Rye”> </div> ... </div>
  • 6.
  • 7.
    Library Catalogue Controlled Vocabulary (c) Vienna University Library Metadata (c) Bill Steele/Cornell Chronicle Identifier (c) Vienna University Library
  • 8.
    OPAC Metadata Controlled Vocabulary Identifier
  • 9.
    WWW / Wikipedia/ Search Engines Identifier? Metadata? Controlled Vocabulary?
  • 10.
  • 11.
    Semantic Web -Early Vision "Mom needs to see a specialist and then has to have a series of physical therapy sessions. Biweekly or something. I'm going to have my agent set up the appointments." “The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users” “For the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.” ~2000 2011
  • 12.
    Semantic Web Technologies User Interface & Applications Trust Proof Unifying Logic Ontology: Rules: RIF Query: OWL SPARQL RDF-S Crypto Data Model: RDF XML URI Unicode ~2000 2011
  • 13.
    RDFa & Microformats • Mechanisms to embed structured metadata in Web pages • Define and/or reuse (X)HTML attributes to augment information in Websites with machine-readable semantics ~2000 2011
  • 14.
    RDFa Example <div xmlns="http://www.w3.org/1999/xhtml" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:v="http://www.w3.org/2006/vcard/ns#"> <div about="http://example.com/me/behas" typeof="v:VCard"> <span property="v:fn">Bernhard Haslhofer</span> <span property="v:nickname">behas</span> <div rel="v:adr"> <div typeof="v:Address v:Work"> <span property="v:street-address">301 College Avenue</span> <span property="v:locality">Ithaca</span>, <span property="v:postal-code">14850</span>, <span property="v:country-name">United States</span>. </div> </div> <a rel="v:email" href="mailto:[email protected]">[email protected]</a>. </div> </div> ~2000 2011
  • 15.
    Microformats Example <div class="vcard"> <span class="fn">Bernhard Haslhofer</span> <div class="adr"> <div class="street-address">301 College Avenue</div> <span class="locality">Ithaca</span> <span class="postal-code">14850</span> <span class="country-name">United States</span> </div> <a class="email" href="mailto:[email protected]">[email protected]</a> </div> ~2000 2011
  • 16.
    Linked Data •There is lots of information on the Web • ... valuable information that can be (re-)used • Problem • information is usually expressed in the form of HTML documents • the underlying raw data are locked in closed data silos (mostly DBMS) ~2000 2011
  • 17.
    Why Linked Data? • The Web is successful because it provides • Uniform encoding (HTML) • Uniform addressing (URI) • Uniform transportation (HTTP) for the exchange of documents. • Why not apply the same mechanism to the underlying data? ~2000 2011
  • 18.
    What is LinkedData? • A pragmatic method to build a Web of Data • Architectural style based on SW standards • Intelligent agents not primary focus Web ~2000 2011
  • 19.
    Publishing Data •Distinguish between non-information and information resource • Sample non-information resource • http://dbpedia.org/resource/The_Catcher_in_the_Rye • Sample information resource • http://dbpedia.org/page/The_Catcher_in_the_Rye - HTML • http://dbpedia.org/data/The_Catcher_in_the_Rye - RDF ~2000 2011
  • 20.
  • 21.
    Microdata (HTML5) •A very young HTML 5 proposition that extends Microformats and addresses its shortcomings • Items are created within an itemscope • Ever item is assigned an arbitrary number of properties (itemprop) • Uses global identifiers for typing and naming items ~2000 2011
  • 22.
    Microdata Example <div itemscopeitemtype="http://data-vocabulary.org/Person"> <span itemprop="name">Bernhard Haslhofer</span>, <span itemprop="nickname">behas</span>. <div itemprop="address" itemscope itemtype="http://data-vocabulary.org/Address"> <span itemprop="street-address">301 College Avenue</span> <span itemprop="locality">Ithaca</span> <span itemprop="country-name">United States</span> </div> </div> ~2000 2011
  • 23.
    Google Rich Snippets/ SEO ~2000 2011
  • 24.
  • 25.
  • 26.
  • 27.
    technical / conceptual complexity User Interface & Applications Trust Proof Unifying Logic Ontology: Rules: RIF Query: OWL SPARQL RDF-S Crypto Data Model: RDF XML URI Unicode RDFa Microdata Microformats ~2000 2011
  • 28.
  • 32.
  • 33.
  • 34.
    Deal with withschema.org • Ignore it? • Adopt it? • Align existing library models with schema.org? • Schema.org provides an extension mechanism for • properties • classes
  • 36.
    Data Quality /Resource Sync • The Web is not static • Resources and their representations might change or disappear over time • Make sure that • applications can synchronize resources and learn about changes • go back in time
  • 37.
    Use Web Datain Apps • Aggregate Web resources into special collections • DBpedia provides resource descriptions translated into 90+ languages!!! • Use URIs instead of labels for tagging • Combine and mesh up data • Analyze data ...
  • 38.
  • 39.
    Metadata is back •Metadata was introduced in the 19th century to deal with the information overload • Cataloguing rules and workflows evolved over time • The Web seemed to work pretty well without metadata (info retrieval, nat.lang processing) • Now we have strong indicators that structured metadata on the Web will play an important role in future • Shouldn’t libraries / librarians be part of that?
  • 40.
    References • Coyle, K.: Library Data in a Modern Context. In: Understanding the Semantic Web: Bibliographic Data and Metadata. Library Technology Reports. January 2010 • http://blog.mediaspaces.info/ (Linked Data in Libraries State-of-the-Art)
  • 41.
  • 42.
    Metadata Building Blocks class relationship Schema Definition Language property Metadata Schema Title Author Genre Title The Catcher in the Rye Metadata Author Salinger, J.D. Genre Fiction (Digital / Non-Digital) Information Object
  • 43.
    Google Rich SnippetTypes • Reviews • People • Products • Businesses and organizations • Recipes • Events ~2000 2011
  • 45.
  • 46.
  • 47.
    Microformats RDFa flat namespace XML namespaces support HTML4, XHTML 1.1, and support for XHTML 1.1 HTML 5 use latent HTML attributes introduces new metadata attributes vocabulary defined by one open to any RDF-based vocabulary organization/community cp.: http://evan.prodromou.name/RDFa_vs_microformats ~2000 2011
  • 48.
    Publishing Data GET http://dbpedia.org/resource/The_Catcher_in_the_Rye Accept: application/rdf+xml 303 See Other Location: http://dbpedia.org/data/The_Catcher_in_the_Rye GET http://dbpedia.org/data/The_Catcher_in_the_Rye Accept: application/rdf+xml 200 OK ... <?xml version="1.0" encoding="utf-8"?> <rdf:RDF ... ~2000 2011