SlideShare a Scribd company logo
How	well	does	your	Instance	Matching	
system	perform?		
Experimental	evaluation	with	LANCE	
Tzanina	Saveta,	Evangelia	Daskalaki,	Giorgos	Flouris,	
Irini	Fundulaki	
Institute	of	Computer	Science	–	FORTH,	Greece	
Axel-Cyrille	Ngonga	Ngomo	
IFI/AKSW,	University	of	Leipzig,	Germany	
10/31/16	 ISWC	2016:	How	well	does	your	Instance	Matching	system	perform?	Experimental	evaluation	with	LANCE	 1
Why	Instance	Matching?	
ISWC	2016:	How	well	does	your	Instance	Matching	system	perform?	Experimental	evaluation	with	LANCE	 2	
*Adapted	from	Suchanek	&	Weikum	tutorial@SIGMOD	2013	
Different	sources	
contain	different	
descriptions	of	the	
same	real	world	
entity
Instance	Matching	for	Linked	Data	
10/31/16	 ISWC	2016:	How	well	does	your	Instance	Matching	system	perform?	Experimental	evaluation	with	LANCE	 3	
Set	of	RDF	triples	
constitute	an	RDF	
graph	
Sparse	Data	
Rich	semantics	
expressed		in	terms	
of	ontologies	
Large	number	
of	sources	to	
integrate	Value,	Structure	
and		Semantics	
Heterogeneities	
*Adapted	from	Suchanek	&	Weikum	tutorial@SIGMOD	2013
Benchmarking	
10/31/16	 ISWC	2016:	How	well	does	your	Instance	Matching	system	perform?	Experimental	evaluation	with	LANCE	 4	
Instance	matching	has	led	to	the	development	of	a	number	
of	matching	techniques	and	tools	
•  How	to	compare	those?	
•  How	to	assess	their	performance	(efficiency	and	
effectiveness)?	
•  How	to	“push”	systems	into	becoming	better?	
•  Benchmark	your	systems!
Instance	Matching	Benchmark	Components	
•  Datasets	
–  Source	and	the	target	datasets	that	will	be	matched	together	to	
find	the	entities	that	refer	to	the	same	real	world	object	
•  Ground	truth	/	Gold	standard	/	Reference	alignment	
–  The	“correct	answer	sheet”	used	to	judge	the	completeness	and	
soundness	of	the	results	produced	by	the	SUT	
•  Organized	into	test	cases	each	addressing	different	kind	of	
instance	matching	requirements	
•  Metrics	
–  The	performance	metric(s)	that	determine	the	systems’	
efficiency	and	effectiveness	
	
10/31/16	 ISWC	2016:	How	well	does	your	Instance	Matching	system	perform?	Experimental	evaluation	with	LANCE	 5
LANCE	
•  A	novel	instance	matching	benchmark	generator	
•  Domain-independent		
•  Highly	configurable	and	scalable	
•  Standard	value-based	and	structure-based	test	cases	
•  	Advanced	semantics-aware	test	cases	considering	OWL2	
expressive	constructs	
•  Rich	weighted	gold	standard	
•  Additional	metrics:	similarity	score	metric	
10/31/16	 ISWC	2016:	How	well	does	your	Instance	Matching	system	perform?	Experimental	evaluation	with	LANCE	 6
10/31/16	 ISWC	2016:	How	well	does	your	Instance	Matching	system	perform?	Experimental	evaluation	with	LANCE	 7	
LANCE	Architecture	
Source 
Data
Target 
Data
Weighted 
Gold Standard 
Resource
Transformation 
Module
RESCAL
[NT12]
MATCHER
 SAMPLER
Weight Computation Module
Test Case
Generation
Parameters
RDF
Repository
Data
Ingestion
Module
Initialization
Module
Resource
Generator
Test Case Generator
SPARQL
Queries
(Schema
Stats)
SPARQL
Queries
(IR)
Matched Instances
Source
Data
Test	Cases	
Test	cases	are	built	using	a	variety	of	transformations	
•  Value-based	test	cases	
–  Transformations	of	values	of	data	type	properties	
•  Structure-based	test	cases	
–  Transformations	of	structure	of	object	and	data	type	properties	
•  Semantics-aware	test	cases	
–  Transformations	at	the	instance	level	considering	the	schema	
•  Simple	and	Complex	combination	of	the	three	first	categories	
	10/31/16	 ISWC	2016:	How	well	does	your	Instance	Matching	system	perform?	Experimental	evaluation	with	LANCE	 8
LANCE	Performance	Metrics	
•  Average	similarity	score:	average	difficulty	of	the	matched	instances	
–  Benchmark	with	high	average	similarity	score:	matched	instances	are	
easier	to	find	
•  Standard	deviation:	spread	of	similarity	scores	for	the	matched	instances		
–  Benchmark	with	high	standard	deviation:		
•  scores	are	spread	out	from	the	average		
•  more	heterogeneity	of	matched	instances	
10/31/16	 HOBBIT	Plenary	2	
Obtain	a	more	fine-grained	understanding	of	the	IM	system’s	
performance	by		comparing	the	average	standard	deviation	and		
similarity	score	of	the	system	and		benchmark
Experiments	
•  Efficiency	and	effectiveness	of	IM	systems	using	LANCE	benchmarks	
–  Systems:	
•  LogMap	Version	2.4	[JG11]	(MoRe	Reasoner	[RG13])	
•  OtO	[DP12]	
•  LIMES	(EAGLE	IM	algorithm	[NL12])	
–  Datasets	
•  	LDBC’s	SPIMBENCH	Generator	(Semantic	Publishing	
Benchmark)	
•  UOBM		
–  Matching	Task	
•  All	5	categories	introduced	previously		
•  All	instances	were	transformed		
	
10
SPIMBENCH:	Standard	Metrics	
11	
•  LogMap		
–  Respond	well	in	the	value-based	test	cases		
–  Reduced	performance	when	also	semantics-aware	test	
cases	were	applied
SPIMBENCH:	Standard	Metrics	
12	
•  OtO	and	EAGLE		
–  Give	good	results	regarding	the	value-based	
transformations		
–  Reduced	performance	in	the	remaining	categories	
•  EAGLE		is	non-deterministic	and	uses	unsupervised	learning
UOBM:	Standard	Metrics	
•  LogMap		
1.	Does	not	perform	well	to	any	of	the	categories		
2.	Performance	not	affected	by	the	dataset	size	
•  OtO	
1.	Performs	better		
2.	Reduced		
performance		
when	increasing		
dataset	size	
13
SPIMBENCH:	Additional	Metrics	
Distribution	of	similarity	scores	for	LANCE	and	True	Positive	
matches	from	IM	systems	for	semantics-aware	test	cases	in	
the	case	of	the	10K	triples	dataset.	
	
•  LogMap	can	address		
difficult	test	cases	
	
•  EAGLE	&	OtO	can	address	
mostly	value-based	test	cases	
	
1	
10	
100	
0.7	 0.72	 0.74	 0.76	 0.78	 0.8	 0.82	 0.84	 0.86	 0.88	 0.9	 0.92	 0.94	 0.96	 0.98	 1	
log(#	of	mappings)	
Similarity	Scores	
OtO	 EAGLE	 LogMap	 LANCE	
14	
Standard	Devia8on
UOBM:	Additional	Metrics	
Distribution	of	similarity	scores	for	LANCE	and	True	Positive	
matches	from	IM	systems	for	structure-based	test	cases	in	
the	case	of	the	10K	triples	dataset.	
	
•  LogMap	cannot		
address	well	the		
change	of	URIs	in	the	Instances	
ISWC	2016:	How	well	does	your	Instance	Matching	system	perform?	Experimental	evaluation	with	LANCE	 15	
1	
10	
100	
0.6	 0.62	 0.64	 0.66	 0.68	 0.7	 0.72	 0.74	 0.76	 0.78	 0.8	 0.82	 0.84	 0.86	 0.88	 0.9	
log(#	of	mappings)	
Similarity	
OtO	 LogMap	 LANCE	
0	
0.01	
0.02	
0.03	
0.04	
0.05	
0.06	
0.07	
0.08	
OtO	 LogMap	 LANCE
Lessons	Learned	
	
•  Different	type	of	transformations	affect	IM	system’s	
performance	
•  The	characteristics	of	source	datasets	affect	the	behavior	of	
IM	systems	
10/31/16	 ISWC	2016:	How	well	does	your	Instance	Matching	system	perform?	Experimental	evaluation	with	LANCE	 16
Questions?	
10/31/16	 ISWC	2016:	How	well	does	your	Instance	Matching	system	perform?	Experimental	evaluation	with	LANCE	 17
Acknowledgments	
This	project	has	received	funding	from	the	European	Union’s	
Horizon	2020	research	and	innovation	programme	under	grant	
agreement	No	688227.	
10/31/16	 ISWC	2016:	How	well	does	your	Instance	Matching	system	perform?	Experimental	evaluation	with	LANCE	 18
References		
[JG11]	E.	Jimenez-Ruiz	and	B.	C.	Grau.	Logmap:	Logic-based	and	scalable	ontology	matching.	
In	ISWC,	2011.	
[RG13]	A.	A.	Romero,	B.C.	Grau,	et	al.	MORe:	a	Modular	OWL	Reasoner	for	Ontology	
Classification.		In	ORE,	pages	61-67,	2013.	
[DP12]	E.	Daskalaki	and	D.	Plexousakis.	OtO	Matching	System:	A	Multi-strategy	Approach	to	
Instance	Matching.	In	CAiSE,	2012.	
[NL12]	A.-C.	Ngonga	Ngomo	and	K.	Lyko.	EAGLE:	Efficient	Active	Learning	of	Link	
Specifications	using	Genetic	Programming.	In	ESWC,	2012.	
19

More Related Content

What's hot (20)

PDF
LinkSUM: Using Link Analysis to Summarize Entity Data
Andreas Thalhammer
 
PPTX
SUMMA: A Common API for Linked Data Entity Summaries
Andreas Thalhammer
 
PPTX
LD4KD 2015 - Demos and tools
Vrije Universiteit Amsterdam
 
PPT
Semantic Pipes and Semantic Mashups
giurca
 
PPTX
mchristy-DH2014-emop-bookhistory-tools
Matt Christy
 
PPTX
From Early Modern Printing to Post-Modern Indie Publishing: Using eMOP on AFP
Matt Christy
 
PPTX
Improving Model Predictions via Stacking and Hyper-parameters Tuning
Jo-fai Chow
 
PDF
Ice dec04-04-sammy
Chun Ming Au Yeung
 
PPTX
A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...
María Poveda Villalón
 
PPTX
Digital Frontiers 2015: eMOP's Imprint (Printer's and Publisher's) DB
Matt Christy
 
PDF
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
National Institute of Informatics
 
PPTX
Navigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards
Liz Grumbach
 
PDF
ESWC 2013 Poster: Representing and Querying Negative Knowledge in RDF
Fariz Darari
 
PDF
Knowledge Graph Embeddings for Recommender Systems
Enrico Palumbo
 
PPTX
eMOP-PennSt-lunch
Matt Christy
 
PPTX
Kaggle competitions, new friends, new skills and new opportunities
Jo-fai Chow
 
PDF
Semantic Web Technology
Rathachai Chawuthai
 
PDF
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
Jeff Z. Pan
 
ODP
What the Adoption of schema.org Tells about Linked Open Data
Heiko Paulheim
 
PDF
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim
 
LinkSUM: Using Link Analysis to Summarize Entity Data
Andreas Thalhammer
 
SUMMA: A Common API for Linked Data Entity Summaries
Andreas Thalhammer
 
LD4KD 2015 - Demos and tools
Vrije Universiteit Amsterdam
 
Semantic Pipes and Semantic Mashups
giurca
 
mchristy-DH2014-emop-bookhistory-tools
Matt Christy
 
From Early Modern Printing to Post-Modern Indie Publishing: Using eMOP on AFP
Matt Christy
 
Improving Model Predictions via Stacking and Hyper-parameters Tuning
Jo-fai Chow
 
Ice dec04-04-sammy
Chun Ming Au Yeung
 
A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...
María Poveda Villalón
 
Digital Frontiers 2015: eMOP's Imprint (Printer's and Publisher's) DB
Matt Christy
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
National Institute of Informatics
 
Navigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards
Liz Grumbach
 
ESWC 2013 Poster: Representing and Querying Negative Knowledge in RDF
Fariz Darari
 
Knowledge Graph Embeddings for Recommender Systems
Enrico Palumbo
 
eMOP-PennSt-lunch
Matt Christy
 
Kaggle competitions, new friends, new skills and new opportunities
Jo-fai Chow
 
Semantic Web Technology
Rathachai Chawuthai
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
Jeff Z. Pan
 
What the Adoption of schema.org Tells about Linked Open Data
Heiko Paulheim
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim
 

Viewers also liked (8)

PDF
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
Holistic Benchmarking of Big Linked Data
 
PDF
Link Discovery Tutorial Part II: Accuracy
Holistic Benchmarking of Big Linked Data
 
PDF
Versioning for Linked Data: Archiving Systems and Benchmarks
Holistic Benchmarking of Big Linked Data
 
PPTX
SPARQL Querying Benchmarks ISWC2016
Muhammad Saleem
 
PDF
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Holistic Benchmarking of Big Linked Data
 
PDF
Link Discovery Tutorial Part V: Hands-On
Holistic Benchmarking of Big Linked Data
 
PDF
Link Discovery Tutorial Part I: Efficiency
Holistic Benchmarking of Big Linked Data
 
PDF
Hobbit presentation at Apache Big Data Europe 2016
Holistic Benchmarking of Big Linked Data
 
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
Holistic Benchmarking of Big Linked Data
 
Link Discovery Tutorial Part II: Accuracy
Holistic Benchmarking of Big Linked Data
 
Versioning for Linked Data: Archiving Systems and Benchmarks
Holistic Benchmarking of Big Linked Data
 
SPARQL Querying Benchmarks ISWC2016
Muhammad Saleem
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Holistic Benchmarking of Big Linked Data
 
Link Discovery Tutorial Part V: Hands-On
Holistic Benchmarking of Big Linked Data
 
Link Discovery Tutorial Part I: Efficiency
Holistic Benchmarking of Big Linked Data
 
Hobbit presentation at Apache Big Data Europe 2016
Holistic Benchmarking of Big Linked Data
 
Ad

More from Holistic Benchmarking of Big Linked Data (20)

PDF
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
Holistic Benchmarking of Big Linked Data
 
PDF
Benchmarking Big Linked Data: The case of the HOBBIT Project
Holistic Benchmarking of Big Linked Data
 
PDF
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Holistic Benchmarking of Big Linked Data
 
PDF
The DEBS Grand Challenge 2018
Holistic Benchmarking of Big Linked Data
 
PPTX
Benchmarking of distributed linked data streaming systems
Holistic Benchmarking of Big Linked Data
 
PDF
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
Holistic Benchmarking of Big Linked Data
 
PDF
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
Holistic Benchmarking of Big Linked Data
 
PPTX
The DEBS Grand Challenge 2017
Holistic Benchmarking of Big Linked Data
 
PDF
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
Holistic Benchmarking of Big Linked Data
 
PDF
Scalable Link Discovery for Modern Data-Driven Applications (poster)
Holistic Benchmarking of Big Linked Data
 
PDF
An Evaluation of Models for Runtime Approximation in Link Discovery
Holistic Benchmarking of Big Linked Data
 
PDF
Scalable Link Discovery for Modern Data-Driven Applications
Holistic Benchmarking of Big Linked Data
 
PDF
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Holistic Benchmarking of Big Linked Data
 
PPTX
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
Holistic Benchmarking of Big Linked Data
 
PDF
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Holistic Benchmarking of Big Linked Data
 
PDF
OKE2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
PDF
MOCHA 2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
PDF
Dynamic planning for link discovery - ESWC 2018
Holistic Benchmarking of Big Linked Data
 
PDF
Hobbit project overview presented at EBDVF 2017
Holistic Benchmarking of Big Linked Data
 
PDF
Leopard ISWC Semantic Web Challenge 2017 (poster)
Holistic Benchmarking of Big Linked Data
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
Holistic Benchmarking of Big Linked Data
 
Benchmarking Big Linked Data: The case of the HOBBIT Project
Holistic Benchmarking of Big Linked Data
 
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Holistic Benchmarking of Big Linked Data
 
The DEBS Grand Challenge 2018
Holistic Benchmarking of Big Linked Data
 
Benchmarking of distributed linked data streaming systems
Holistic Benchmarking of Big Linked Data
 
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
Holistic Benchmarking of Big Linked Data
 
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
Holistic Benchmarking of Big Linked Data
 
The DEBS Grand Challenge 2017
Holistic Benchmarking of Big Linked Data
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
Holistic Benchmarking of Big Linked Data
 
Scalable Link Discovery for Modern Data-Driven Applications (poster)
Holistic Benchmarking of Big Linked Data
 
An Evaluation of Models for Runtime Approximation in Link Discovery
Holistic Benchmarking of Big Linked Data
 
Scalable Link Discovery for Modern Data-Driven Applications
Holistic Benchmarking of Big Linked Data
 
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Holistic Benchmarking of Big Linked Data
 
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
Holistic Benchmarking of Big Linked Data
 
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Holistic Benchmarking of Big Linked Data
 
OKE2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
MOCHA 2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
Dynamic planning for link discovery - ESWC 2018
Holistic Benchmarking of Big Linked Data
 
Hobbit project overview presented at EBDVF 2017
Holistic Benchmarking of Big Linked Data
 
Leopard ISWC Semantic Web Challenge 2017 (poster)
Holistic Benchmarking of Big Linked Data
 
Ad

Recently uploaded (20)

PDF
GUGC Research Overview (December 2024)
Ghent University Global Campus
 
DOCX
Paper - Taboo Language (Makalah Presentasi)
Sahmiral Amri Rajagukguk
 
PDF
Calcium in a supernova remnant as a fingerprint of a sub-Chandrasekhar-mass e...
Sérgio Sacani
 
PDF
BlackBody Radiation experiment report.pdf
Ghadeer Shaabna
 
DOCX
Critical Book Review (CBR) - "Hate Speech: Linguistic Perspectives"
Sahmiral Amri Rajagukguk
 
PPTX
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
PPTX
Systamatic Acquired Resistence (SAR).pptx
giriprasanthmuthuraj
 
PPTX
Q1_Science 8_Week3-Day 1.pptx science lesson
AizaRazonado
 
PDF
The ALMA-CRISTAL survey: Gas, dust, and stars in star-forming galaxies when t...
Sérgio Sacani
 
PDF
Integrating Lifestyle Data into Personalized Health Solutions (www.kiu.ac.ug)
publication11
 
PDF
Unit-3 ppt.pdf organic chemistry - 3 unit 3
visionshukla007
 
PDF
soil and environmental microbiology.pdf
Divyaprabha67
 
PDF
Plankton and Fisheries Bovas Joel Notes.pdf
J. Bovas Joel BFSc
 
PPTX
CNS.pptx Central nervous system meninges ventricles of brain it's structure a...
Ashwini I Chuncha
 
PDF
Annual report 2024 - Inria - English version.pdf
Inria
 
PDF
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
PPTX
abdominal compartment syndrome presentation and treatment.pptx
LakshmiMounicaGrandh
 
PDF
Plant growth promoting bacterial non symbiotic
psuvethapalani
 
PPTX
Cerebellum_ Parts_Structure_Function.pptx
muralinath2
 
PDF
Pharmakon of algorithmic alchemy: Marketing in the age of AI
Selcen Ozturkcan
 
GUGC Research Overview (December 2024)
Ghent University Global Campus
 
Paper - Taboo Language (Makalah Presentasi)
Sahmiral Amri Rajagukguk
 
Calcium in a supernova remnant as a fingerprint of a sub-Chandrasekhar-mass e...
Sérgio Sacani
 
BlackBody Radiation experiment report.pdf
Ghadeer Shaabna
 
Critical Book Review (CBR) - "Hate Speech: Linguistic Perspectives"
Sahmiral Amri Rajagukguk
 
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
Systamatic Acquired Resistence (SAR).pptx
giriprasanthmuthuraj
 
Q1_Science 8_Week3-Day 1.pptx science lesson
AizaRazonado
 
The ALMA-CRISTAL survey: Gas, dust, and stars in star-forming galaxies when t...
Sérgio Sacani
 
Integrating Lifestyle Data into Personalized Health Solutions (www.kiu.ac.ug)
publication11
 
Unit-3 ppt.pdf organic chemistry - 3 unit 3
visionshukla007
 
soil and environmental microbiology.pdf
Divyaprabha67
 
Plankton and Fisheries Bovas Joel Notes.pdf
J. Bovas Joel BFSc
 
CNS.pptx Central nervous system meninges ventricles of brain it's structure a...
Ashwini I Chuncha
 
Annual report 2024 - Inria - English version.pdf
Inria
 
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
abdominal compartment syndrome presentation and treatment.pptx
LakshmiMounicaGrandh
 
Plant growth promoting bacterial non symbiotic
psuvethapalani
 
Cerebellum_ Parts_Structure_Function.pptx
muralinath2
 
Pharmakon of algorithmic alchemy: Marketing in the age of AI
Selcen Ozturkcan
 

How well does your Instance Matching system perform? Experimental evaluation with LANCE