NERD: an open source
platform for extracting and
disambiguating named entities
in very diverse documents
Raphaël Troncy <raphael.troncy@eurecom.fr>
Giuseppe Rizzo <giuseppe.rizzo@eurecom.fr>
What is a Named Entity recognition task?
 A task that aims to locate and classify the name of a
person or an organization, a location, a brand, a
product, a numeric expression including time, date,
money and percent in a textual document

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

-2
Example
 “ I want to book a room in an hotel located in
the heart of Paris, just a stone’s throw from the
Eiffel Tower ”

Eric Charton, “Named Entity Detection and Entity Linking in the
Context of Semantic Web: Exploring the ambiguity question”

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

-3
Part of Speech
I
want
to
book
a
room
in
…
Paris

PRP
VBP
TO
VB
DT
NN
IN
…
NNP

NER: What is Paris?
NEL: Which Paris are we
talking about?

Giuseppe Rizzo, “Learning with the Web: Structuring data to
ease machine understanding”

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

-4
What is Paris? Type Ambiguity

dbpedia-owl:Asteroid

schema:City

schema:Movie
dbpedia-owl:Film

Giuseppe Rizzo, “Learning with the Web: Structuring data to
ease machine understanding”

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

-5
Named Entity Recognition (NER)
I
want
to
book
a
room
in
…
Paris

PRP
VBP
TO
VB
DT
NN
IN
…
NNP

O
O
O
O
O
O
O
…
LOC

Giuseppe Rizzo, “Learning with the Web: Structuring data to
ease machine understanding”

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

-6
What is Paris? Name Ambiguity

Paris, Kentucky

Paris, France

Paris, Maine

Paris, Idaho

Paris, Tennessee

Paris, Ontario

Giuseppe Rizzo, “Learning with the Web: Structuring data to
ease machine understanding”
22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

-7
Named Entity Linking (NEL)
I
want
to
book
a
room
in
…
Paris

PRP
VBP
TO
VB
DT
NN
IN
…
NNP

O
O
O
O
O
O
O
…
LOC

O
O
O
O
O
O
O
…
http://dbpedia.org/resource/Paris

Giuseppe Rizzo, “Learning with the Web: Structuring data to
ease machine understanding”

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

-8
NER Tools and Web APIs
 Standalone software
 GATE
 Stanford CoreNLP
 Temis

http://nerd.eurecom.fr/

 Web APIs

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

-9
NERD: Named Entity Recognition and
Disambiguation
 Compare performances of
NER and NEL tools
 Understand strengths and weaknesses of different Web APIs
 Adapt NER processing to different context

 (Learn how to) Combine NER (/ NEL) tools
 Participate in various benchmarks

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 10
What is NERD?
ontology1

REST API2
UI3

1

http://nerd.eurecom.fr/ontology
2 http://nerd.eurecom.fr/api/application.wadl
3 http://nerd.eurecom.fr

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 11
Factual comparison of 10 Web NER tools
Alchemy
API

DBpedia
Spotlight

Evri

Extractiv

Lupedia

Open
Calais

Saplo

Wikimeta

Yahoo!

Zemanta

Language

EN,FR,
GR,IT,
PT,RU,
SP,SW

EN
GR*
PT*
SP*

EN,I
T

EN

EN,FR,
IT

EN,FR
SP

EN,
SW

EN,FR
SP

EN

EN

Granularity

OEN

OEN

OED

OEN

OEN

OEN

OED

OEN

OEN

OED

Entity
position

N/A

char
offset

N/A

word
offset

range of
chars

char
offset

N/A

POS
offset

range
of
chars

N/A

Alchemy

DBpedia
FreeBase
Scema.or
g

Evri

DBpedia

DBpedia
LinkedM
DB

Open
Calais

N/A

ESTER

Yahoo

FreeBase

Number of
classes

324

320

5

34

319

95

5

7

13

81

Response
Format

JSON
MicroF
XML
RDF

HTML
JSON
RDF
XML

HTM
L
JSO
N
RDF

HTML
JSON
RDF
XML

HTML
JSON
RDFa
XML

JSON
MicroF
ormat

JSON

JSON
XML

JSON
XML

XML
JSON
RDF

Quota
(calls/day)

30000

unl

300
3000
unl
50000
NLP&DBpedia International Workshop, Sydney, October 2013
0

1333

unl

5000

10000

Classification
schema

22/10/2013 -

12/15
NERD Ontology

Aligned the taxonomies used by
the extractors
22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 13
NERD type

Building the NERD Ontology

Occurrence

Person

10

Organization

10

Country
Company

6

Continent

5

City

5

RadioStation

5

Album

5

Product

5

...

NLP&DBpedia International Workshop, Sydney, October 2013

6

Location

22/10/2013 -

6

...

- 14
NERD REST API
RDF
/document
/user
/annotation/{extractor}
/extraction
/evaluation
...

GET,
POST,
PUT,
DELETE

JSON
“entities” : [{
“entity”: “Tim Berners-Lee” ,
“type”: “Person” ,
“uri”: "http://dbpedia.org/resource/Tim_berners_lee",
“nerdType”: "http://nerd.eurecom.fr/ontology#Person",
“startChar”: 30,
“endChar”: 45,
“confidence”: 1,
“relevance”: 0.5
}]

Rizzo G., Troncy R. (2012), NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Web Extraction
Tools. In: European chapter of the Association for Computational Linguistics (EACL'12), Avignon, France.

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 15
NERD meets NIF
Model documents through a
set of strings deferencable on
the Web
: offset_23107_ 23110 a str:String ;
str:referenceContext :offset_0_26546 .

Map string to entity
: offset_23107_ 23110 sso:oen dbpedia:W3C.

Classification
dbpedia:W3C

rdf:type

nerd:Organization .

Rizzo G, Troncy R., Hellmann S. and Bruemmer M. (2012), NERD meets NIF: Lifting NLP Extraction Results to the Linked
Data Cloud. In: (LDOW'12) Linked Data on the Web (WWW'12), Lyon, France.
22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 16
NERD User Dashboard

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 17
NERD User Interface

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 18
History of NER benchmarks
 CoNLL 2003 and CoNLL 2005
 schema (4 types): person, organization, location and miscellaneous

 ACE 2004, ACE 2005 and ACE 2007
 schema (7 types): person, organization, location, facility, weapon,
vehicle and geo-political entity
 entity recognition, co-ref, find relationships among entities extracted

 TAC 2009 (Knowledge Base Track)
 schema (3 types): person, organization and location
 create a knowledge base from the named entities extracted

 ETAPE 2012 (Named Entity Task)
 schema: Quaero (7 main types, 32 sub-types)

 MSM 2013: tweet corpus !
 schema (4 types): person, organization, location, miscellaneous
22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 19
ETAPE 2012 challenge
genre

train

dev

test

TV news

7h 40m

1h 40m

1h 40m

BFM Story, Top QUestions (LCP)

TV debates

10h 30m

5h 10m

5h 10m

Pile et Face, Ca vous regarde,
Entre les lignes (LCP)

1h 05m

1h 05m

La place du village (TV8)

TV amusements -

sources

Train

Dev

Eval

Item length

26h

10h 55m

10h 55m

Nb files

44

15

15

Nb words

290517

91656

115511

Nb Named Entities

46763

14398

13055

Nb unique categories

33

33

33

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 20
NERD @ ETAPE (naïve combined strategy)
extraction

(eA1,tA1,URIA1,siA1,eiA1) ...
(eA2,tA2,URIA2,siA2,eiA2)
(eA3,tA3,URIA3,siA3,eiA3)

...

...

cleaning

fusion
`

22/10/2013 -

(eN1,tN1,URIN1,siN1,eiN1)
(eN2,tN2,URIN2,siN2,eiN2)

When at least 2 extractors classify the
same entity with a different type then
we apply a preferred selection order
(empirically defined): Wikimeta,
AlchemyAPI, OpenCalais, Lupedia

NLP&DBpedia International Workshop, Sydney, October 2013

- 21
Participation at ETAPE (combined+ strategy)
ETAPE
Train & Dev

...
Learned model

POS tagger

Created
static rules

Apply rules

(eA1,tA1,URIA1,siA1,eA1
)
(eA2,tA2,URIA2,siA2,eiA2
)

(e1,t1,URI1,si1,ei1)

fusion
Conflicts handled by
priority selection: own,
Wikimeta,AlchemyAPI,
OpenCalais,Lupedia

(eN1,tN1,URIN1,sN1,eN1)
`(e ,t ,URI ,s ,e )
N2 N2
N2 N2 N2
22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 22
NERD Global results

SLR

Precision

Recall

F-measure

%correct

combined

86.85%

35.31%

17.69%

23.44%

17.69%

combined+

188.81%

15.13%

28.40%

19.45%

28.40%

Combined+ : Eval corpus differs substantially from the Train & Dev
corpora. The static rules do not fit well the Eval corpora and they
introduce classification noise.

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 23
Per-extractor results
SLR

Precision

Recall

F-measure

%correct

alchemyapi

37.71%

47.95%

5.45%

9.68%

5.45%

lupedia

39.49%

22.87%

1.56%

2.91%

1.56%

opencalais

37.47%

41.69%

3.53%

6.49%

3.53%

wikimeta

36.67%

19.40%

4.25%

6.95%

4.25%

combined
(nerd)

86.85%

35.31%

17.69%

23.44%

17.69%

combined+
(nerd+)

188.81%

15.13%

28.40%

19.45%

28.40%

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 24
22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 25
Learning How to Combine NER Extractors

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 26
NERD on CoNLL 2003 (NER task)

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 27
NERD on MSM 2013 (NER task)

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 28
NERD on MSM 2013 (NEL task)

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 29
Media Fragment Enricher:
http://mfe.synote.org/mfe/

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 30
Linking pieces of knowledge

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 31
Linking pieces of knowledge

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 32
Named Entities for Video Classification

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 33
Workflow
5:Timed Text
6: NEs with time
alignment
(json)

2: Metadata
7: RDFize (ttl)
Media Fragment Enricher Services
Metadata &
timed-text

1: Video
URL

NERD
Client

3: metadata

RDFizator

9: SPARQL query
4:NERDify

Video and
metadata preview

Categorization

Triple Store

8: Generate
Category

Video replay with subtitles and
aligned NEs

Media Fragment Enricher UI

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 34
Channel signature based on NE distribution

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 35
22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 36
LinkedTV: automatic annotations ...

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 37
... and enrichment for hypervideos

CONCEPT IN
PLAYER
Cubism

Expressionism

Fauvism

FACETS / PROPERTIES OF CONCEPT
22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

CONTENT ENRICHMENT
- 38
Media Fragments and Annotations

http://data.linkedtv.eu/medi
a/e2899e7f#t=840,900

nerd:Location
Casablanca

nerd:Location
Cafe Rick

nerd:Person
H. Bogart

nerd:Person
I. Bergman

 Media Fragment URI 1.0





22/10/2013 -

Chapters
Scenes
Shots
etc…

NLP&DBpedia International Workshop, Sydney, October 2013

- 39
Enrichment and Hypervideos

nerd:Location
Casablanca

nerd:Location
Cafe Rick

nerd:Person
H. Bogart

Nerd:Person
E. Tierney

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

nerd:Person
I. Bergman
nerd:Location
China

- 40
Media Fragment + Open Annotation + NERD
Locator

MediaResource
OffsetBasedString

Annotation

MediaFragment

Entity
Type

URL (hyperlink)

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 41
Towards a Linked Media Layer
 Enriching media with media from a closed collection
(e.g. BBC archive)
 The MediaEval scenario (~ 1697 hours of archived BBC video)
http://www.multimediaeval.org/mediaeval2013/hyper2013/

 Enriching media with content from the open web
 LinkedTV scenarios: white listed web sites for each program
 Media Collector for Social Media
22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 42
Seed video enriched with web content
rbbaktuell_20120809

nerd:Location
Brandenburg
oa
Enrichments are Annotations too

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 44
Media Finder (named entities clustering)

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 45
Media Finder (zooming in a cluster)

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 46
Media Finder: http://mediafinder.eurecom.fr/
 Live Topic Generation from Event Streams
 WWW 2013 Demo Session
 http://www.youtube.com/watch?v=8iRiwz7cDYY

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 47
Credits
 Giuseppe Rizzo, Vuk Milicic,
José Luis Redondo Garcia (EURECOM)
 Thomas Steiner (Google Inc.)
 Marieke van Erp (Free University of Amsterdam)
 Yunjia Li (University of Southampton)
 … and many other students

22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 48
http://www.slideshare.net/troncy
22/10/2013 -

NLP&DBpedia International Workshop, Sydney, October 2013

- 49

NERD: an open source platform for extracting and disambiguating named entities in very diverse documents NLP-DBpedia 2013

  • 1.
    NERD: an opensource platform for extracting and disambiguating named entities in very diverse documents Raphaël Troncy <[email protected]> Giuseppe Rizzo <[email protected]>
  • 2.
    What is aNamed Entity recognition task?  A task that aims to locate and classify the name of a person or an organization, a location, a brand, a product, a numeric expression including time, date, money and percent in a textual document 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 -2
  • 3.
    Example  “ Iwant to book a room in an hotel located in the heart of Paris, just a stone’s throw from the Eiffel Tower ” Eric Charton, “Named Entity Detection and Entity Linking in the Context of Semantic Web: Exploring the ambiguity question” 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 -3
  • 4.
    Part of Speech I want to book a room in … Paris PRP VBP TO VB DT NN IN … NNP NER:What is Paris? NEL: Which Paris are we talking about? Giuseppe Rizzo, “Learning with the Web: Structuring data to ease machine understanding” 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 -4
  • 5.
    What is Paris?Type Ambiguity dbpedia-owl:Asteroid schema:City schema:Movie dbpedia-owl:Film Giuseppe Rizzo, “Learning with the Web: Structuring data to ease machine understanding” 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 -5
  • 6.
    Named Entity Recognition(NER) I want to book a room in … Paris PRP VBP TO VB DT NN IN … NNP O O O O O O O … LOC Giuseppe Rizzo, “Learning with the Web: Structuring data to ease machine understanding” 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 -6
  • 7.
    What is Paris?Name Ambiguity Paris, Kentucky Paris, France Paris, Maine Paris, Idaho Paris, Tennessee Paris, Ontario Giuseppe Rizzo, “Learning with the Web: Structuring data to ease machine understanding” 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 -7
  • 8.
    Named Entity Linking(NEL) I want to book a room in … Paris PRP VBP TO VB DT NN IN … NNP O O O O O O O … LOC O O O O O O O … http://dbpedia.org/resource/Paris Giuseppe Rizzo, “Learning with the Web: Structuring data to ease machine understanding” 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 -8
  • 9.
    NER Tools andWeb APIs  Standalone software  GATE  Stanford CoreNLP  Temis http://nerd.eurecom.fr/  Web APIs 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 -9
  • 10.
    NERD: Named EntityRecognition and Disambiguation  Compare performances of NER and NEL tools  Understand strengths and weaknesses of different Web APIs  Adapt NER processing to different context  (Learn how to) Combine NER (/ NEL) tools  Participate in various benchmarks 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 10
  • 11.
    What is NERD? ontology1 RESTAPI2 UI3 1 http://nerd.eurecom.fr/ontology 2 http://nerd.eurecom.fr/api/application.wadl 3 http://nerd.eurecom.fr 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 11
  • 12.
    Factual comparison of10 Web NER tools Alchemy API DBpedia Spotlight Evri Extractiv Lupedia Open Calais Saplo Wikimeta Yahoo! Zemanta Language EN,FR, GR,IT, PT,RU, SP,SW EN GR* PT* SP* EN,I T EN EN,FR, IT EN,FR SP EN, SW EN,FR SP EN EN Granularity OEN OEN OED OEN OEN OEN OED OEN OEN OED Entity position N/A char offset N/A word offset range of chars char offset N/A POS offset range of chars N/A Alchemy DBpedia FreeBase Scema.or g Evri DBpedia DBpedia LinkedM DB Open Calais N/A ESTER Yahoo FreeBase Number of classes 324 320 5 34 319 95 5 7 13 81 Response Format JSON MicroF XML RDF HTML JSON RDF XML HTM L JSO N RDF HTML JSON RDF XML HTML JSON RDFa XML JSON MicroF ormat JSON JSON XML JSON XML XML JSON RDF Quota (calls/day) 30000 unl 300 3000 unl 50000 NLP&DBpedia International Workshop, Sydney, October 2013 0 1333 unl 5000 10000 Classification schema 22/10/2013 - 12/15
  • 13.
    NERD Ontology Aligned thetaxonomies used by the extractors 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 13
  • 14.
    NERD type Building theNERD Ontology Occurrence Person 10 Organization 10 Country Company 6 Continent 5 City 5 RadioStation 5 Album 5 Product 5 ... NLP&DBpedia International Workshop, Sydney, October 2013 6 Location 22/10/2013 - 6 ... - 14
  • 15.
    NERD REST API RDF /document /user /annotation/{extractor} /extraction /evaluation ... GET, POST, PUT, DELETE JSON “entities”: [{ “entity”: “Tim Berners-Lee” , “type”: “Person” , “uri”: "http://dbpedia.org/resource/Tim_berners_lee", “nerdType”: "http://nerd.eurecom.fr/ontology#Person", “startChar”: 30, “endChar”: 45, “confidence”: 1, “relevance”: 0.5 }] Rizzo G., Troncy R. (2012), NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Web Extraction Tools. In: European chapter of the Association for Computational Linguistics (EACL'12), Avignon, France. 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 15
  • 16.
    NERD meets NIF Modeldocuments through a set of strings deferencable on the Web : offset_23107_ 23110 a str:String ; str:referenceContext :offset_0_26546 . Map string to entity : offset_23107_ 23110 sso:oen dbpedia:W3C. Classification dbpedia:W3C rdf:type nerd:Organization . Rizzo G, Troncy R., Hellmann S. and Bruemmer M. (2012), NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud. In: (LDOW'12) Linked Data on the Web (WWW'12), Lyon, France. 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 16
  • 17.
    NERD User Dashboard 22/10/2013- NLP&DBpedia International Workshop, Sydney, October 2013 - 17
  • 18.
    NERD User Interface 22/10/2013- NLP&DBpedia International Workshop, Sydney, October 2013 - 18
  • 19.
    History of NERbenchmarks  CoNLL 2003 and CoNLL 2005  schema (4 types): person, organization, location and miscellaneous  ACE 2004, ACE 2005 and ACE 2007  schema (7 types): person, organization, location, facility, weapon, vehicle and geo-political entity  entity recognition, co-ref, find relationships among entities extracted  TAC 2009 (Knowledge Base Track)  schema (3 types): person, organization and location  create a knowledge base from the named entities extracted  ETAPE 2012 (Named Entity Task)  schema: Quaero (7 main types, 32 sub-types)  MSM 2013: tweet corpus !  schema (4 types): person, organization, location, miscellaneous 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 19
  • 20.
    ETAPE 2012 challenge genre train dev test TVnews 7h 40m 1h 40m 1h 40m BFM Story, Top QUestions (LCP) TV debates 10h 30m 5h 10m 5h 10m Pile et Face, Ca vous regarde, Entre les lignes (LCP) 1h 05m 1h 05m La place du village (TV8) TV amusements - sources Train Dev Eval Item length 26h 10h 55m 10h 55m Nb files 44 15 15 Nb words 290517 91656 115511 Nb Named Entities 46763 14398 13055 Nb unique categories 33 33 33 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 20
  • 21.
    NERD @ ETAPE(naïve combined strategy) extraction (eA1,tA1,URIA1,siA1,eiA1) ... (eA2,tA2,URIA2,siA2,eiA2) (eA3,tA3,URIA3,siA3,eiA3) ... ... cleaning fusion ` 22/10/2013 - (eN1,tN1,URIN1,siN1,eiN1) (eN2,tN2,URIN2,siN2,eiN2) When at least 2 extractors classify the same entity with a different type then we apply a preferred selection order (empirically defined): Wikimeta, AlchemyAPI, OpenCalais, Lupedia NLP&DBpedia International Workshop, Sydney, October 2013 - 21
  • 22.
    Participation at ETAPE(combined+ strategy) ETAPE Train & Dev ... Learned model POS tagger Created static rules Apply rules (eA1,tA1,URIA1,siA1,eA1 ) (eA2,tA2,URIA2,siA2,eiA2 ) (e1,t1,URI1,si1,ei1) fusion Conflicts handled by priority selection: own, Wikimeta,AlchemyAPI, OpenCalais,Lupedia (eN1,tN1,URIN1,sN1,eN1) `(e ,t ,URI ,s ,e ) N2 N2 N2 N2 N2 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 22
  • 23.
    NERD Global results SLR Precision Recall F-measure %correct combined 86.85% 35.31% 17.69% 23.44% 17.69% combined+ 188.81% 15.13% 28.40% 19.45% 28.40% Combined+: Eval corpus differs substantially from the Train & Dev corpora. The static rules do not fit well the Eval corpora and they introduce classification noise. 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 23
  • 24.
  • 25.
    22/10/2013 - NLP&DBpedia InternationalWorkshop, Sydney, October 2013 - 25
  • 26.
    Learning How toCombine NER Extractors 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 26
  • 27.
    NERD on CoNLL2003 (NER task) 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 27
  • 28.
    NERD on MSM2013 (NER task) 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 28
  • 29.
    NERD on MSM2013 (NEL task) 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 29
  • 30.
  • 31.
    Linking pieces ofknowledge 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 31
  • 32.
    Linking pieces ofknowledge 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 32
  • 33.
    Named Entities forVideo Classification 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 33
  • 34.
    Workflow 5:Timed Text 6: NEswith time alignment (json) 2: Metadata 7: RDFize (ttl) Media Fragment Enricher Services Metadata & timed-text 1: Video URL NERD Client 3: metadata RDFizator 9: SPARQL query 4:NERDify Video and metadata preview Categorization Triple Store 8: Generate Category Video replay with subtitles and aligned NEs Media Fragment Enricher UI 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 34
  • 35.
    Channel signature basedon NE distribution 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 35
  • 36.
    22/10/2013 - NLP&DBpedia InternationalWorkshop, Sydney, October 2013 - 36
  • 37.
    LinkedTV: automatic annotations... 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 37
  • 38.
    ... and enrichmentfor hypervideos CONCEPT IN PLAYER Cubism Expressionism Fauvism FACETS / PROPERTIES OF CONCEPT 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 CONTENT ENRICHMENT - 38
  • 39.
    Media Fragments andAnnotations http://data.linkedtv.eu/medi a/e2899e7f#t=840,900 nerd:Location Casablanca nerd:Location Cafe Rick nerd:Person H. Bogart nerd:Person I. Bergman  Media Fragment URI 1.0     22/10/2013 - Chapters Scenes Shots etc… NLP&DBpedia International Workshop, Sydney, October 2013 - 39
  • 40.
    Enrichment and Hypervideos nerd:Location Casablanca nerd:Location CafeRick nerd:Person H. Bogart Nerd:Person E. Tierney 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 nerd:Person I. Bergman nerd:Location China - 40
  • 41.
    Media Fragment +Open Annotation + NERD Locator MediaResource OffsetBasedString Annotation MediaFragment Entity Type URL (hyperlink) 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 41
  • 42.
    Towards a LinkedMedia Layer  Enriching media with media from a closed collection (e.g. BBC archive)  The MediaEval scenario (~ 1697 hours of archived BBC video) http://www.multimediaeval.org/mediaeval2013/hyper2013/  Enriching media with content from the open web  LinkedTV scenarios: white listed web sites for each program  Media Collector for Social Media 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 42
  • 43.
    Seed video enrichedwith web content rbbaktuell_20120809 nerd:Location Brandenburg oa
  • 44.
    Enrichments are Annotationstoo 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 44
  • 45.
    Media Finder (namedentities clustering) 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 45
  • 46.
    Media Finder (zoomingin a cluster) 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 46
  • 47.
    Media Finder: http://mediafinder.eurecom.fr/ Live Topic Generation from Event Streams  WWW 2013 Demo Session  http://www.youtube.com/watch?v=8iRiwz7cDYY 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 47
  • 48.
    Credits  Giuseppe Rizzo,Vuk Milicic, José Luis Redondo Garcia (EURECOM)  Thomas Steiner (Google Inc.)  Marieke van Erp (Free University of Amsterdam)  Yunjia Li (University of Southampton)  … and many other students 22/10/2013 - NLP&DBpedia International Workshop, Sydney, October 2013 - 48
  • 49.