s




       NERD meets NIF:
Lifting NLP Extraction Results
   to the Linked Data Cloud
        Giuseppe Rizzo and Raphaël Troncy
                EURECOM, France
      Sebastian Hellmann and Martin Bruemmer
            Universität Leipzig, Germany
What is a Named Entity recognition task?
A task that aims to locate and classify the name of a person or an
organization, a location, a brand, a product, a numeric expression
including time, date, money and percent in a textual document




 16/04/2012        5th Workshop on Linked Data on the Web (LDOW2012)   2/15
NER tools

 Standalone software
 GATE
 Stanford CoreNLP
 Temis

 Web APIs




16/04/2012   5th Workshop on Linked Data on the Web (LDOW2012)   3/15
Factual comparison of 10 Web NER tools
                    Alchemy   DBpedia         Evri     Extractiv      Lupedia       Open      Saplo   Wikimeta    Yahoo!     Zemanta
                      API     Spotlight                                             Calais

Language            EN,FR,      EN            EN,          EN         EN,FR,        EN,FR     EN,      EN,FR            EN     EN
                    GR,IT,      GR*            IT                       IT           SP       SW        SP
                    PT,RU,      PT*
                    SP,SW       SP*

Granularity          OEN        OEN           OED        OEN            OEN          OEN      OED       OEN         OEN        OED
Entity               N/A       char           N/A        word         range of       char     N/A      POS         range       N/A
position                       offset                    offset        chars         offset            offset        of
                                                                                                                   chars

Classification      Alchemy   DBpedia         Evri     DBpedia        DBpedia       Open      N/A     ESTER        Yahoo     FreeBase
schema                        FreeBase                                LinkedM       Calais
                              Scema.or                                   DB
                                  g



Number of            324        320             5          34            319           95      5         7              13     81
classes
Response            JSON       HTML           HTM        HTML          HTML         JSON      JSON     JSON        JSON        XML
Format              MicroF     JSON            L         JSON          JSON         MicroF              XML         XML       JSON
                     XML        RDF           JSO         RDF          RDFa         ormat                                      RDF
                     RDF        XML            N          XML           XML
                                              RDF

Quota               30000       unl          3000        3000            unl        50000     1333      unl         5000      10000
(calls/day)
       16/04/2012                         5th Workshop on Linked Data on the Web (LDOW2012)                      4/15
What is NERD?
    ontology1                    REST API2
                         UI3
                                                                                         The NERD ontology has been
                                                                                         integrated in the NIF project,
                                                                                         a EU FP7 in the context of the
                                                                                         LOD2: Creating Knowledge
                                                                                         out of Interlinked Data


1
  http://nerd.eurecom.fr/ontology
2
  http://nerd.eurecom.fr/api/application.wadl
3
  http://nerd.eurecom.fr


    16/04/2012                       5th Workshop on Linked Data on the Web (LDOW2012)                  5/15
NERD Ontology




             Aligned the taxonomies used by
                      the extractors
16/04/2012        5th Workshop on Linked Data on the Web (LDOW2012)   6/15
NERD type          Occurrence
Building the NERD Ontology                                       Person                     10
                                                                 Organization               10
                                                                 Country                      6
                                                                 Company                      6
                                                                 Location                     6
                                                                 Continent                    5
                                                                 City                         5
                                                                 RadioStation                 5
                                                                 Album                        5
                                                                 Product                      5
                                                                 ...                         ...




16/04/2012   5th Workshop on Linked Data on the Web (LDOW2012)               7/15
Ontology alignment validation

  5 TED
   talks

  1000
   NYT
  news
 articles
   217
WWW2011
 abstracts

  16/04/2012   5th Workshop on Linked Data on the Web (LDOW2012)   8/15
Integration
  Different outputs for the NLP tools (Standalone
 and Web APIs)
  OpenCalais                                                       DBpedia Spotlight
  "_type": "Organization",                                         "@URI": "http://dbpedia.org/resource/DBpedia",
  “name": "North Atlantic Treaty Organization",                    "@types": "DBpedia:Software,DBpedia:Work”
  "organizationtype": "governmental civilian",                     "@surfaceForm": "dbpedia",
  "nationality": "N/A",                                            "@offset": "0",
  "_typeReference":                                                "@support": "11",
      http://s.opencalais.com/1/type/em/e/Organization",           "@similarityScore": "0.2387271374464035",
  ...                                                              …



  For integration or reuse manual effort is needed
         time consuming
         difficult to track definitions
  NERD creates a sharable JSON/RDF annotation
 output

16/04/2012                        5th Workshop on Linked Data on the Web (LDOW2012)                       9/15
NERD REST API

                                                                         “entities” : [{
                                                                                “entity”: “W3C” ,
                                                                                “type”: “Organization” ,
                                                                                “uri”: "http://dbpedia.org/page/W3C",
                                                      JSON                      “nerdType”:
                                                                         "http://nerd.eurecom.fr/ontology#Organization",
                                                                                “startChar”: 30,
                                                                                “endChar”: 32,
/document/{idDocument}                                                          “confidence”: 1,
/user/{idUser}                     GET,                                         “relevance”: 0.5
                                  POST,                                  }]
/annotation/{extractor}
/extraction/{idExtraction}         PUT,
/evaluation                      DELETE
...

                                                        RDF




   16/04/2012                5th Workshop on Linked Data on the Web (LDOW2012)                     10/15
Textual annotation

 Let's consider the URI:
 http://www.w3.org/DesignIssues/LinkedData.html
     The Semantic Web isn't just about putting data on the web. It is about
     making links, so that a person or machine can explore the web of data.
     With linked data, when you have some of it, you can find other, related,
     data.….
     All the above plus, Use open standards from W3C (RDF and SPARQL) to
     identify things, so that people can point at your stuff
     ...

   entities: {
       …
       [entity: W3C, startChar: 23107, endChar: 23110],
       …
   }
16/04/2012             5th Workshop on Linked Data on the Web (LDOW2012)   11/15
NERD meets NIF

                                      Model documents through a
                                      set of strings deferencable
                                      within the Web
                      : offset_23107_ 23110 a str:String ;
                               str:referenceContext :offset_0_26546 .

                                      Map string to entity
                      : offset_23107_ 23110 sso:oen dbpedia:W3C .


                                       Classification
                       dbpedia:W3C rdf:type nerd:Organization .



16/04/2012   5th Workshop on Linked Data on the Web (LDOW2012)   12/15
NERD User Interface




16/04/2012   5th Workshop on Linked Data on the Web (LDOW2012)   13/15
Conclusions and perspectives
NERD UI and REST API
   unified interface for extracting NEs from various type of texts
NERD ontology
    common schema for entity classification
NERD & NIF
    lift the extraction annotation results to the LOD cloud

Systematic comparison for the NE extraction and
classification tasks:
      ETAPE corpus
      CoNLL 2003 corpus

Combining several extractions to improve the strengths
of a single tool
16/04/2012          5th Workshop on Linked Data on the Web (LDOW2012)   14/15
Thanks for your time and your attention




             http://nerd.eurecom.fr

             @giusepperizzo @rtroncy #nerd



             http://www.slideshare.net/giusepperizzo




16/04/2012   5th Workshop on Linked Data on the Web (LDOW2012)   15/15

NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud

  • 1.
    s NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud Giuseppe Rizzo and Raphaël Troncy EURECOM, France Sebastian Hellmann and Martin Bruemmer Universität Leipzig, Germany
  • 2.
    What is aNamed Entity recognition task? A task that aims to locate and classify the name of a person or an organization, a location, a brand, a product, a numeric expression including time, date, money and percent in a textual document 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 2/15
  • 3.
    NER tools  Standalonesoftware GATE Stanford CoreNLP Temis  Web APIs 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 3/15
  • 4.
    Factual comparison of10 Web NER tools Alchemy DBpedia Evri Extractiv Lupedia Open Saplo Wikimeta Yahoo! Zemanta API Spotlight Calais Language EN,FR, EN EN, EN EN,FR, EN,FR EN, EN,FR EN EN GR,IT, GR* IT IT SP SW SP PT,RU, PT* SP,SW SP* Granularity OEN OEN OED OEN OEN OEN OED OEN OEN OED Entity N/A char N/A word range of char N/A POS range N/A position offset offset chars offset offset of chars Classification Alchemy DBpedia Evri DBpedia DBpedia Open N/A ESTER Yahoo FreeBase schema FreeBase LinkedM Calais Scema.or DB g Number of 324 320 5 34 319 95 5 7 13 81 classes Response JSON HTML HTM HTML HTML JSON JSON JSON JSON XML Format MicroF JSON L JSON JSON MicroF XML XML JSON XML RDF JSO RDF RDFa ormat RDF RDF XML N XML XML RDF Quota 30000 unl 3000 3000 unl 50000 1333 unl 5000 10000 (calls/day) 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 4/15
  • 5.
    What is NERD? ontology1 REST API2 UI3 The NERD ontology has been integrated in the NIF project, a EU FP7 in the context of the LOD2: Creating Knowledge out of Interlinked Data 1 http://nerd.eurecom.fr/ontology 2 http://nerd.eurecom.fr/api/application.wadl 3 http://nerd.eurecom.fr 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 5/15
  • 6.
    NERD Ontology Aligned the taxonomies used by the extractors 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 6/15
  • 7.
    NERD type Occurrence Building the NERD Ontology Person 10 Organization 10 Country 6 Company 6 Location 6 Continent 5 City 5 RadioStation 5 Album 5 Product 5 ... ... 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 7/15
  • 8.
    Ontology alignment validation 5 TED talks 1000 NYT news articles 217 WWW2011 abstracts 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 8/15
  • 9.
    Integration  Differentoutputs for the NLP tools (Standalone and Web APIs) OpenCalais DBpedia Spotlight "_type": "Organization", "@URI": "http://dbpedia.org/resource/DBpedia", “name": "North Atlantic Treaty Organization", "@types": "DBpedia:Software,DBpedia:Work” "organizationtype": "governmental civilian", "@surfaceForm": "dbpedia", "nationality": "N/A", "@offset": "0", "_typeReference": "@support": "11", http://s.opencalais.com/1/type/em/e/Organization", "@similarityScore": "0.2387271374464035", ... …  For integration or reuse manual effort is needed  time consuming  difficult to track definitions  NERD creates a sharable JSON/RDF annotation output 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 9/15
  • 10.
    NERD REST API “entities” : [{ “entity”: “W3C” , “type”: “Organization” , “uri”: "http://dbpedia.org/page/W3C", JSON “nerdType”: "http://nerd.eurecom.fr/ontology#Organization", “startChar”: 30, “endChar”: 32, /document/{idDocument} “confidence”: 1, /user/{idUser} GET, “relevance”: 0.5 POST, }] /annotation/{extractor} /extraction/{idExtraction} PUT, /evaluation DELETE ... RDF 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 10/15
  • 11.
    Textual annotation Let'sconsider the URI: http://www.w3.org/DesignIssues/LinkedData.html The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.…. All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff ... entities: { … [entity: W3C, startChar: 23107, endChar: 23110], … } 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 11/15
  • 12.
    NERD meets NIF Model documents through a set of strings deferencable within the Web : offset_23107_ 23110 a str:String ; str:referenceContext :offset_0_26546 . Map string to entity : offset_23107_ 23110 sso:oen dbpedia:W3C . Classification dbpedia:W3C rdf:type nerd:Organization . 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 12/15
  • 13.
    NERD User Interface 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 13/15
  • 14.
    Conclusions and perspectives NERDUI and REST API unified interface for extracting NEs from various type of texts NERD ontology common schema for entity classification NERD & NIF lift the extraction annotation results to the LOD cloud Systematic comparison for the NE extraction and classification tasks: ETAPE corpus CoNLL 2003 corpus Combining several extractions to improve the strengths of a single tool 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 14/15
  • 15.
    Thanks for yourtime and your attention http://nerd.eurecom.fr @giusepperizzo @rtroncy #nerd http://www.slideshare.net/giusepperizzo 16/04/2012 5th Workshop on Linked Data on the Web (LDOW2012) 15/15