THE EVOLVING SEMANTIC
      WORLD



Barbara McGlamery
Taxonomist
Martha Stewart Living Omnimedia
ABOUT ME
   Masters in Library and Information Science
       Long Island University


   New York Public Library
     Branch librarian
     NYPL for the Performing Arts – Drama reference



   Entertainment Weekly
       Data Manager


   Time Inc.
       Senior Data Manager, Taxonomist, Metadata Architect, Ontologist


   Martha Stewart Living Omnimedia
       Taxonomist
WHAT IS THE SEMANTIC WEB?
The Semantic Web is a web of data…. (it) provides a
common framework that allows data to be shared and
reused across applications, enterprise, and community
boundaries.
                                           --w3c
"The Semantic Web is not a separate Web but an
extension of the current one, in which information is
given well-defined meaning, better enabling computers
and people to work in cooperation.”

              --Tim Berners-Lee, James Hendler, and Ora Lassila,
                       Scientific American, 2001
The Semantic Web is about making knowledge
machine and human-readable
---- Amit Agarwal
http://www.labnol.org/internet/web-3-concepts-
explained/8908/
Web 1.0        Web 2.0         Web 3.0
Connections   Collaboration   Intelligence
Big   S semantic web


         Little s semantic web
BIG S SEMANTIC WEB

…big "S" web technologies provide a
framework for describing data on a web page when
the data on the website is published. If data is read
or captured, because the data's semantic meaning
has already been described, you don't have to go
through the process of understanding the meaning
of the data after the fact.

                      --Sean Martin, CEO of Cambridge Semantics
LITTLE S SEMANTICS


 Little "s" web technologies capture and filter data with no
 description or understanding of the data provided after
 the capture process. The process of understanding the
 meaning of that data starts once data capture has
 happened. People have to intervene to provide the
 context and meaning for language on the web.
              --Sean Martin, CEO of Cambridge Semantics
Big   S–
W3C approved
standard




       Little s
       Looser groups of unaffiliated
       standards
BIG S SEMANTICS
ESSENTIALS OF BIG         S SEMANTIC WEB
    URI – Uniform Resource Identifier

    RDF – Resource Description Framework

    OWL – Web Ontology Language


  Semantic     reasoner (inference engine)
URI – UNIFORM RESOURCE IDENTIFIER

    Way to identify things
        Images, pages of text, locations

    De-referenceable
        Freebase
             http://www.freebase.com/view/en/will_smith


 • URI’s are unique, no two are the same
         • Will Smith
             http://www.freebase.com/view/en/will_smith
RDF – RESOURCE DESCRIPTION FRAMEWORK

   Framework used to describe relationships between
    objects

   Extends and formalizes XML

   Subject>Predicate>Object
RDF – RESOURCE DESCRIPTION FRAMEWORK
                    Subject>Predicate>Object


                           >> >>>
                           is the lead
                           actor
                           >>>>>>

           Will
           Smith                            Bad Boys

http://ew.com/PersonsTax/Will_Smith

                   http://ew.com/EntertainmentOnt/leadPe
                   rformanceIn

                                      http://ew.com/EntertainmentTax/Mo
                                      vies/Bad_Boys
OWL – WEB ONTOLOGY LANGUAGE


…designed to be used by applications that need to
process the content of information instead of just
presenting it to humans
                             -- W3C
OWL – WEB ONTOLOGY LANGUAGE

   Metadata model
       Extends RDF to further define properties
           Ex: Equivalent relationships




                               >> >>>
                               is married to
                               >>>>>>



                              >> >>>
                              is married to
                              >>>>>>
SEMANTIC REASONER
   Software able to infer logical consequences from a set
    of asserted facts

   Follows inference rules specified by OWL properties
        Inverse
        Transitive

        Symmetric

        Functional/Inverse functional

        Equivalent
PUTTING IT ALL TOGETHER
   Ontology
       Rule set
           Classes and Properties

   Taxonomy
       Application of Rule Set
           Tags and Relationships

   Everything is a statement
       Subject>Predicate>Object
            Ex: Will Smith is lead performer
             in Bad Boys
BENEFITS OF RDF/OWL

        Persistent URIs

        Verifiable XML

        Unambiguous Relationships

        Polyhierarchy

        Interoperability
LIMITATIONS OF RDF/OWL

        Difficult to propagate across web

        Challenge to integrate with legacy systems

        Expensive queries

        No “Killer App”
SEMANTIC WEB LAYER CAKE
LITTLE S SEMANTICS
RDFa   - Resource Description Framework (in) Attributes


     W3C recommendation that adds a set
      of attribute-level extensions to XHTML
      for embedding rich metadata within
      Web documents

          Easy to implement
          Not HTML 5 compliant
RDFA: BEST BUY
LINKED OPEN DATA 2007




               “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/   ”
Linked Open Data
2010




                   “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
MICROFORMATS


        Semantic markup which seeks to re-use
         existing HTML/XHTML class attributes to
         structure data

            Easy to implement
            Limited formats
MICROFORMATS: BON APPÉTIT
MICRODATA



     A WHATWG HTML5 specification used to nest
      semantics within existing content on web pages

         Officially supported by Bing, Yahoo, & Google
         Can imbed other markup languages like
          RDFa, microformats, and Dublin Core
         Not well-known (yet)
MICRODATA:
STEVE: THE MUSEUM SOCIAL TAGGING PROJECT
OPEN GRAPH PROTOCOL

     Facebook-created markup language that turns any
      web page into an Open Graph Objects allowing for
      any page to become a Facebook page

         I “Like” you
         Good for targeted advertising
         Limited in scope
OGP: MARTHA STEWART
BACK-OF-THE-NAPKIN COMPARISON
Features       RDF/OW   RDFa   MF   MD   OGP
               L
W3C            X        X           X
standard
Extensible     X        X           X

Pre-existing   X        X
Vocabs

Uses URIs      X        X
Easy to                 X      X    X    X
implement

HMTL 5                         X    X    X
compliant
Inferencing    X
STATUS REPORT ON         S SEMANTIC WEB
   Linked Open Data graph growing

   Many countries have developed government sites with
    rich semantics

   Development of Semantic search

   More widespread adoption of lighter semantics
WHERE WE MIGHT BE GOING

   Pharmaceutical industry identifies trends across clinical
    studies, and not just within them

   News industry better targets content by locale

   Department of Defense using it to make better decisions
    in the field

   Utilized in advertising to drive more and more revenue
QUESTIONS?
   Barbara McGlamery
    Taxonomist
    Martha Stewart Living Omnimedia
    (212)827-8817
    bmcglamery@marthastewart.com

The Evolving Semantic Web

  • 1.
    THE EVOLVING SEMANTIC WORLD Barbara McGlamery Taxonomist Martha Stewart Living Omnimedia
  • 2.
    ABOUT ME  Masters in Library and Information Science  Long Island University  New York Public Library  Branch librarian  NYPL for the Performing Arts – Drama reference  Entertainment Weekly  Data Manager  Time Inc.  Senior Data Manager, Taxonomist, Metadata Architect, Ontologist  Martha Stewart Living Omnimedia  Taxonomist
  • 3.
    WHAT IS THESEMANTIC WEB?
  • 4.
    The Semantic Webis a web of data…. (it) provides a common framework that allows data to be shared and reused across applications, enterprise, and community boundaries. --w3c
  • 5.
    "The Semantic Webis not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” --Tim Berners-Lee, James Hendler, and Ora Lassila, Scientific American, 2001
  • 6.
    The Semantic Webis about making knowledge machine and human-readable
  • 7.
  • 8.
    Web 1.0 Web 2.0 Web 3.0 Connections Collaboration Intelligence
  • 9.
    Big S semantic web Little s semantic web
  • 10.
    BIG S SEMANTICWEB …big "S" web technologies provide a framework for describing data on a web page when the data on the website is published. If data is read or captured, because the data's semantic meaning has already been described, you don't have to go through the process of understanding the meaning of the data after the fact. --Sean Martin, CEO of Cambridge Semantics
  • 11.
    LITTLE S SEMANTICS Little "s" web technologies capture and filter data with no description or understanding of the data provided after the capture process. The process of understanding the meaning of that data starts once data capture has happened. People have to intervene to provide the context and meaning for language on the web. --Sean Martin, CEO of Cambridge Semantics
  • 12.
    Big S– W3C approved standard Little s Looser groups of unaffiliated standards
  • 13.
  • 14.
    ESSENTIALS OF BIG S SEMANTIC WEB  URI – Uniform Resource Identifier  RDF – Resource Description Framework  OWL – Web Ontology Language  Semantic reasoner (inference engine)
  • 15.
    URI – UNIFORMRESOURCE IDENTIFIER  Way to identify things  Images, pages of text, locations  De-referenceable  Freebase  http://www.freebase.com/view/en/will_smith • URI’s are unique, no two are the same • Will Smith  http://www.freebase.com/view/en/will_smith
  • 16.
    RDF – RESOURCEDESCRIPTION FRAMEWORK  Framework used to describe relationships between objects  Extends and formalizes XML  Subject>Predicate>Object
  • 17.
    RDF – RESOURCEDESCRIPTION FRAMEWORK Subject>Predicate>Object >> >>> is the lead actor >>>>>> Will Smith Bad Boys http://ew.com/PersonsTax/Will_Smith http://ew.com/EntertainmentOnt/leadPe rformanceIn http://ew.com/EntertainmentTax/Mo vies/Bad_Boys
  • 18.
    OWL – WEBONTOLOGY LANGUAGE …designed to be used by applications that need to process the content of information instead of just presenting it to humans -- W3C
  • 19.
    OWL – WEBONTOLOGY LANGUAGE  Metadata model  Extends RDF to further define properties  Ex: Equivalent relationships >> >>> is married to >>>>>> >> >>> is married to >>>>>>
  • 20.
    SEMANTIC REASONER  Software able to infer logical consequences from a set of asserted facts  Follows inference rules specified by OWL properties  Inverse  Transitive  Symmetric  Functional/Inverse functional  Equivalent
  • 21.
    PUTTING IT ALLTOGETHER  Ontology  Rule set  Classes and Properties  Taxonomy  Application of Rule Set  Tags and Relationships  Everything is a statement  Subject>Predicate>Object Ex: Will Smith is lead performer in Bad Boys
  • 22.
    BENEFITS OF RDF/OWL  Persistent URIs  Verifiable XML  Unambiguous Relationships  Polyhierarchy  Interoperability
  • 23.
    LIMITATIONS OF RDF/OWL  Difficult to propagate across web  Challenge to integrate with legacy systems  Expensive queries  No “Killer App”
  • 24.
  • 25.
  • 26.
    RDFa - Resource Description Framework (in) Attributes  W3C recommendation that adds a set of attribute-level extensions to XHTML for embedding rich metadata within Web documents  Easy to implement  Not HTML 5 compliant
  • 27.
  • 28.
    LINKED OPEN DATA2007 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ ”
  • 29.
    Linked Open Data 2010 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • 30.
    MICROFORMATS  Semantic markup which seeks to re-use existing HTML/XHTML class attributes to structure data  Easy to implement  Limited formats
  • 31.
  • 32.
    MICRODATA  A WHATWG HTML5 specification used to nest semantics within existing content on web pages  Officially supported by Bing, Yahoo, & Google  Can imbed other markup languages like RDFa, microformats, and Dublin Core  Not well-known (yet)
  • 33.
    MICRODATA: STEVE: THE MUSEUMSOCIAL TAGGING PROJECT
  • 34.
    OPEN GRAPH PROTOCOL  Facebook-created markup language that turns any web page into an Open Graph Objects allowing for any page to become a Facebook page  I “Like” you  Good for targeted advertising  Limited in scope
  • 35.
  • 36.
    BACK-OF-THE-NAPKIN COMPARISON Features RDF/OW RDFa MF MD OGP L W3C X X X standard Extensible X X X Pre-existing X X Vocabs Uses URIs X X Easy to X X X X implement HMTL 5 X X X compliant Inferencing X
  • 37.
    STATUS REPORT ON S SEMANTIC WEB  Linked Open Data graph growing  Many countries have developed government sites with rich semantics  Development of Semantic search  More widespread adoption of lighter semantics
  • 38.
    WHERE WE MIGHTBE GOING  Pharmaceutical industry identifies trends across clinical studies, and not just within them  News industry better targets content by locale  Department of Defense using it to make better decisions in the field  Utilized in advertising to drive more and more revenue
  • 39.
  • 40.
    Barbara McGlamery Taxonomist Martha Stewart Living Omnimedia (212)827-8817 [email protected]

Editor's Notes

  • #2 **The landscape of the semantic web is changing.  Early adopters learned the hard lessons for all of us, that semantic web solutions can be difficult to implement and perhaps not vital to every organization’s interests.  Barbara McGlamery, of Martha Stewart Living Omnimedia will share her experiences of building a Semantic Web tool from scratch for Time Inc. and how a smaller more manageable initiative has been undertaken at Martha Stewart.  She’ll share case studies and lessons learned as well as give a glimpse as to how she sees the industry evolving.
  • #3 Hello my name is blah. I am not a technical librarian, I am a librarian and when I was practicing it was in reference, not systems or back-end. So most of you out there have my respect and awe at knowing how the inside of a cataloging terminal works or TK (find out something LITA librarians do). My foray into the more technical aspects of librarianship came through html and web development.
  • #5 **always refer to acronyms by full names: Resource Description Framework (RDF)Maybe a grid comparing RDF, Microformats, etcLandscape of SW – who created it and why?
  • #11 In brief, The data is machine readable.
  • #13 In short
  • #16 Mention same as
  • #17 Extends and formalizes XMLLinking structure of the Web to use URIs to name the relationship between thingsEx:
  • #20 s designed for use by applications that need to process the content of information instead of just presenting information to humans.
  • #21 Dif between a semantic reasoner and a regular inference engine is that a semantic reasoner knows the rules of owl. It is a more specific use.Inverse – Indicating the reciprocal property. For example, “owner of” is the inverse of the property “is owned by.”Transitive – Indicating that if this property applies between item 1 and item 2, and between item 2 and item 3, then it also applies between item 1 and item 3. For example, if Albuquerque “is located in” New Mexico, and New Mexico “is located in” the USA, then Albuquerque “is located in” the USA.Symmetric – Indicating that the inverse of this property is itself. For example, if the Time/Life Building “is near” Rockefeller Center, then it is also true that Rockefeller Center “is near” the Time/Life Building.Functional – Indicating that there can be only one value for this property for a given resource. For example “has birth mother” – the implication is that if a resource called Bob “has birth mother” Jane and also “has birth mother” Mrs. Smith, then we can assume that Jane and Mrs. Smith are the same person.Inverse Functional – Indicating that only one resource can have a given value for this property – which allows you to make assumptions that if two or more resources have that value, then they are really just two names for the same thing. This is very much like Functional, but in the opposite direction. For example, if there are two names with the same value for “has Social Security number,” we can assume that those are two names for the same person. Equivalent Property – like Equivalent Class, indicates that this property can be extended to the same set of resources that use another property. For example, EW.com’s “lead performance” would be an equivalent property to People’s “starring role.”
  • #25 The bottom layers contain technologies that are well known from hypertext web and that without change provide basis for the semantic web.Middle layers contain technologies standardized by W3C to enable building semantic web applicationsTop layers contain technologies that are not yet standardized or contain just ideas that should be implemented in order to realize Semantic Web.Rules further extend OWL’s capabilitiesProof and Logic establish truth of statements, infer unstated factsTrust – Cryptology, authentication, trustworthiness of statementsSemantic Web FoundationsURI/IRI URI is an acronym for Uniform Resource Identifier; a compact string of characters used to identify or name a resource. The URL to a web site (e.g. http://www.semanticfocus.com) is a popular example of a URI. IRI is an acronym for Internationalized Resource Identifier which is a form of URI that uses characters beyond ASCII, thus becoming more useful in an international context. Unicode Unicode is the universal standard encoding system and provides a unified system for representing textual data. 1 million characters can be encoded to specify any character in any language without a single escape sequence or control code. Before Unicode, there were several different encoding systems which made communication and integration across borders a big pain. Now it's so much easier. Shout out to my peeps in Bangalore, 'haaaay' (अरे, दोस्त)! XML XML is an acronym for Extensible Markup Language. With XML, we have a standard way to compose information so that it can be more easily shared. At the same time, it still affords the freedom to structure that information however the heck we want. It's kind of like HTML - only, you get to make up your own tags and attributes. How cool is that? Namespaces Namespaces (aka XML Namespaces) are integral to XML. Namespaces provide a means to qualify the tags and attributes in an XML document with URIs which then makes them truly unique on the Web and thus, universal (among other things). XML Schema XML Schema describes the structure of XML documents just like DTDs, only better. An XML Schema is known as an XML Schema Definition (XSD). Basically, if you're going to use XML to invent your own document structures, XSD provides the way to define your rules (like guidelines) so that people and machines can understand them, adhere to them, and integrate with them. XML Query XML Query (aka XQuery) is a standardized language for combining documents, databases, Web pages and almost anything else. It is very widely implemented, powerful, and easy to learn. XQuery is replacing proprietary middleware languages and Web Application development languages. XQuery is replacing complex Java or C++ programs with a few lines of code. Personally, I think it is sufficient to refer to these foundational items with just a few broad concepts: Unicode, URI, and XML. Unicode gives us a universal system for encoding information in all of the world's writing systems. URI gives us a standard way to identify and locate resources. XML gives us a way to model information uniquely, yet still share it and integrate it in consistent ways. All together, they help us integrate content and services throughout the Web.
  • #27 Really this should exist with the big S semantics, but it’s here bc the implementation is so light and doesn’t require an inference engine or the use of unambiguous relationships at all. IT is basically using URIs and imbedding structured metadata into the htmlAdds structured metadata to any XML based languageXHTML
  • #29 LOD is a web initiative for orgs to share information in a rdf or rdfa format. It describes the resource that the URI identifies. This makes it possible for a user (or software agent) to "follow your nose" to find out more information related to the identified resource. -- wikipedia
  • #30 Citation!
  • #33 The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML
  • #34 Schema.org is the standards body that is promoting the adoption of the microdata format
  • #38 Semantic search -- contextual meaning of terms as they appear in the searchable dataspace