Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems

Link Discovery Tutorial
Benchmarking for Instance Matching Systems
Axel-Cyrille Ngonga Ngomo(1)
, Irini Fundulaki(2)
, Mohamed Ahmed Sherif(1)
(1) Institute for Applied Informatics, Germany
(2) FORTH, Greece
October 18th, 2016
Kobe, Japan
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 1 / 36

The Questions(s)
Instance matching research has led to the development of
various systems.
What are the problems that I wish
to solve?
What are the relevant key
performance indicators?
What is the behavior of the existing
engines w.r.t. the key performance
indicators?
Which are the tool(s) that I should use for my data
and for my use case?

Importance of Benchmarking
Benchmarks exist
To allow adequate measurements of systems
To provide evaluation of engines for real (or close to real) use cases
Provide help
Designers and Developers to assess the performance of their tools
Users to compare the diﬀerent available tools and evaluate suitability for their
needs
Researchers to compare their work to others
Leads to improvements:
Vendors can improve their technology
Researchers can address new challenges
Current benchmark design can be improved to cover new necessities and
application domains

The Answer: Benchmark your engines!
Instance Matching/Linking Benchmark comprises of
Datasets: The raw material of the benchmarks. These are the source and
the target dataset that will be matched together to ﬁnd the links between
resources
Test Cases: Address heterogeneities (structural, value, semantic) of the
datasets to be matched
Gold Standard (Ground Truth / Reference Alignment): The "correct
answer sheet" used to judge the completeness and soundness of the instance
matching algorithms
Metrics: The performance metric(s) that determine the systems behaviour
and performance

Benchmark Datasets: Characteristics
Nature
Real Datasets: Widely used datasets from a domain of interest
+ Realistic conditions for heterogeneity problems
+ Realistic distributions
- Error prone, hard to create Reference Alignment
Synthetic Datasets: Produced with a data generator (that hopefully produces
data with interesting characteristics
+ Fully controlled test conditions
+ Accurate, Easy to create Reference Alignments
- Unrealistic distributions
- Systematic heterogeneity problems
Schema
Datasets to be matched have the same or diﬀerent schemas
Domain
Datasets come from the same or diﬀerent domains

Benchmark Test Cases: Variations
Value
Name style abbreviations, Typographical errors, change format
(date/gender/number), synonym change, language change (multilinguality)
Structural
Change property depth, Delete/add property, split property values,
transformation of object/data to data/object type property
Semantics
class deletion/modiﬁcation, invert property assertions, change class/property
hierarchy, assert class disjointness
Combinations of Variations

Benchmark: Gold Standard
The "correct answer sheet" used to judge the completeness and soundness of
the instance matching algorithms
Characteristics
Existence of errors / missing alignments
Representation: owl:sameAs and skos:exactMatch

Benchmark: Metrics
Precision P = tp
(tp+fn)
Recall R = tp
(tp+fp)
F-measure F = 2 × P × R
(P+R)

Instance Matching Benchmarks:
Desirable Attributes
Systematic Procedure matching tasks should be reproducible and the exe-
cution must be comparable
Availability benchmark should be available
Quality precise evaluation rules and high quality ontologies
must be provided
Equity evaluation process should not privilege any system
Dissemination benchmark should be used to evaluate instance
matching systems
Volume dataset size
Gold Standard gold standard should exist and be as accurate as pos-
sible

What about Benchmarks?
Instance matching techniques have, until recently, been
benchmarked in an ad-hoc way.
There is no standard way of benchmarking the performance
of the systems, when it comes to Linked Data.

Ontology Alignment Evaluation Initiative
IM benchmarks have been mainly driven forward by the Ontology
Alignment Evaluation Initiative (OAEI)
organizes annual campaign for ontology matching since 2005
hosts independent benchmarks
In 2009, OAEI introduced the Instance Matching (IM) Track
focuses on the evaluation of diﬀerent instance matching techniques and tools
for Linked Data

Instance Matching Benchmarks
Bechmark Generators
Synthetic Benchmarks
Real Benchmarks

Semantic Web Instance Generation
(SWING) [FMN+11]
Semi automatic generator of Instance Matching Benchmarks
Contributed in the generation of IIMB Benchmarks of OAEI in 2010, 2011
and 2012 Instance Matching Tracks
Freely available at (https://code.google.com/p/swing-generator/)
All kind of variations supported into the benchmarks except multilinguality
Automatically produced gold standard

Lance [SDF+15b]
Flexible, generic and domain-independent benchmark generator which takes
into consideration RDFS and OWL constructs in order to evaluate instance
matching systems.

Lance [SDF+15b]
Lance provides support for:
Semantics-aware transformations
Complex class definitions (union, intersection)
Complex property definitions (functional properties, inverse functional
properties)
Disjointness (properties)
Standard value and structure based transformations
Weighted gold standard based on tensor factorization
Varying degrees of difficulty and fine-grained evaluation metrics
Available at http://github.com/jsaveta/Lance

Lance Architecture

Synthetic Benchmarks
Ontology Alignment Evaluation Benchmarks

Synthetic Instance Matching Benchmarks:
Overview (1)
IIMB
2009
IIMB
2010
PR
2010
IIMB
2011
Sandbox
2012
IIMB
2012
RDFT
2013
ID-
REC
2014
Author
Task
2015
Systematic Procedure
√ √ √ √ √ √ √ √ √
Availability
√ √ √ √ √
- -
√ √
Quality
√ √ √ √ √ √ √ √ √
Equity
√ √ √ √ √ √ √ √ √
Dissemination 6 3 6 1 3 4 4 5 5
Volume 0.2K 1.4K 0.86K 4K 0.375K 1.5K 0.43K 2.650K 10K
Gold Standard
√ √ √ √ √ √ √ √ √

Overview (2)
IIMB
2009
IIMB
2010
PR
2010
IIMB
2011
Sandbox
2012
IIMB
2012
RDFT
2013
ID-
REC
2014
Author
Task
2015
Value Variations
√ √ √ √ √ √ √ √ √
Structural Variations
√ √ √ √
- - - + +
Logical Variations
√ √
-
√
-
√
- - -
Multilinguality - - - - - -
√ √ √
IIMB
2009
IIMB
2010
PR
2010
IIMB
2011
Sandbox
2012
IIMB
2012
RDFT
2013
ID-
REC
2014
Author
Task
2015
Blind Evaluations - - - - - -
√ √ √
1-n Mappings - -
√
- - -
√ √
-

Overview (3)
IIMB
2009
IIMB
2010
PR
2010
IIMB
2011
Sandbox
2012
IIMB
2012
RDFT
2013
ID-
REC
2014
Author
Task
2015
Lance
2015
Systematic
Procedure
√ √ √ √ √ √ √ √ √ √
Availability
√ √ √ √ √
- -
√ √ √
Quality
√ √ √ √ √ √ √ √ √ √
Equity
√ √ √ √ √ √ √ √ √ √
Dissemination 6 3 6 1 3 4 4 5 5 2
Volume 0.2K 1.4K 0.86K 4K 0.375K 1.5K 0.43K 2.650K 10K > 1M
Gold
Standard
√ √ √ √ √ √ √ √ √ √

Overview (4)
IIMB
2009
IIMB
2010
PR
2010
IIMB
2011
Sandbox
2012
IIMB
2012
RDFT
2013
ID-
REC
2014
Author
Task
2015
Lance
2015
Value
Variations
√ √ √ √ √ √ √ √ √ √
Structural
Variations
√ √ √ √
- - - + + +
Logical
Variations
√ √
-
√
-
√
- - - +
Multilinguality - - - - - -
√ √ √ √

Overview (5)
IIMB
2009
IIMB
2010
PR
2010
IIMB
2011
Sandbox
2012
IIMB
2012
RDFT
2013
ID-
REC
2014
Author
Task
2015
Lance
2015
Blind
Evaluations
- - - - - -
√ √ √ √
1-n
Mappings
- -
√
- - -
√ √
- -

Real Benchmarks

Real Instance Matching Benchmarks:
Overview (1)
ARS DI 2010 DI 2011
Systematic Procedure
√ √ √
Availability
√ √
-
Quality
√ √ √
Equity
√ √ √
Dissemination 5 2 3
Volume 100K 6K NA
Gold Standard
√ √
+

Real Instance Matching Benchmarks:
Overview (2)
ARS DI 2010 DI 2011
Value Variations
√ √ √
√ √
-
Logical Variations - - -
Multilinguality - - -
Blind Evaluations - - -

Wrapping Up
Multilinguality

Wrapping Up
Value Variations

Wrapping Up

Wrapping Up
Logical Variations

Wrapping Up
Combinations of Variations

Wrapping Up
Scalability

Open Issues
Only one benchmark that tackles both, combination of variations and
scalability issues
Not enough IM benchmark using the full expressiveness of RDF/OWL
language

Systems
Systems can handle the value variations, the structural variation, and the
simple logical variations separately.
More work needed for complex variations (combination of value, structural,
and logical)
More work needed for structural variations
Enhancement of systems to cope with the clustering of the mappings (1-n
mappings)

Conclusions
Many instance matching benchmarks have been proposed
Each of them answering to some of the needs of instance matching systems.
It is essential to start creating benchmarks that will “show the way to the
future”
Extend the limits of existing systems.

Acknowledgment
This work was supported by grants from the EU H2020 Framework Programme
provided for the project HOBBIT (GA no. 688227).

References I

Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems

More Related Content

What's hot (20)

Similar to Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems (20)

More from Holistic Benchmarking of Big Linked Data (20)

Recently uploaded (20)

Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems