SlideShare a Scribd company logo
Combining Ontology Matchers
via Anomaly Detection
Alexander C. Müller and Heiko Paulheim
10/13/15 Alexander C. Müller, Heiko Paulheim 2
Motivation
• Most high-performing matching systems use multiple matchers
• How to combine multiple matchers into a single result?
• Common approaches (selection of)
– average, maximum, minimum matching score
– voting
– expert modeled weights (0.4m1 + 0.3m2 + 0.3m3)
– supervised learning
• Proposal:
– use anomaly detection as an unsupervised aggregation method
10/13/15 Alexander C. Müller, Heiko Paulheim 3
Idea
• Common definitions anomaly/outlier detection:
– Outlier or anomaly detection methods are used to “that appear to
deviate markedly from other members of the same sample", i.e.
– “that appear to be inconsistent with the remainder of the data"
• Rationale:
– for two ontologies with n and m concepts, there are nxm candidates
– the majority are non-matches
– the actual matches are a minority (that differ markedly from the rest)
– so, we should be able to identify them as outliers
10/13/15 Alexander C. Müller, Heiko Paulheim 4
Outlier Detection in a Nutshell
• Given a set of instances as feature vectors
– outlier detection assigns an outlier score to each instance
– higher outlier scores ↔ higher degree of outlierness
• Common approaches
– distance based
– density based
– clustering based
– model based
10/13/15 Alexander C. Müller, Heiko Paulheim 5
Aggregating Matchers via Anomaly Detection
• We run a set of base matchers
• Each base matcher score becomes a numerical feature
• Thus, out feature vectors consist of individual matching scores
10/13/15 Alexander C. Müller, Heiko Paulheim 6
Aggregating Matchers via Anomaly Detection
• Example from the conference dataset
– note: reduced to two dimensions!
10/13/15 Alexander C. Müller, Heiko Paulheim 7
COMMAND: Full Pipeline
• Run set of element-based matchers
– find non-correlated subset
• Run set of structure-based matchers on that subset
• Collect all results into feature vectors
• Perform dimensionality reduction
– removing correlated matchers
– Principal Component Analysis
• Run outlier detection
• Perform optional repair step
10/13/15 Alexander C. Müller, Heiko Paulheim 8
COMMAND: Full Pipeline
10/13/15 Alexander C. Müller, Heiko Paulheim 9
COMMAND: Full Pipeline
• Run set of element-based matchers (28 different ones)
– find non-correlated subset
• Run set of structure-based matchers (five different ones)
on that subset
– Collect all results into feature vectors
• Perform dimensionality reduction
– removing correlated matchers
– Principal Component Analysis
• Run outlier detection
• Normalize outlier scores
• Select mapping candidates
• Perform optional repair setp
10/13/15 Alexander C. Müller, Heiko Paulheim 10
COMMAND: Results
• Good results on biblio benchmark dataset
– up to 67% F-measure
• Median results on conference
– up to 68% F-measure
• Difficulties on anatomy dataset
– only a subset of matchers could be run for scalability reasons
10/13/15 Alexander C. Müller, Heiko Paulheim 11
Discussion and Conclusion
• Proof of Concept
– Anomaly detection is suitable
for matcher aggregation
– non-trivial combination of
matcher scores (PCA, outlier score)
– automatic selection of a suitable
subset of matchers
• Future work
– address scalability issues
– try more anomaly detection
approaches
Combining Ontology Matchers
via Anomaly Detection
Alexander C. Müller and Heiko Paulheim

More Related Content

Viewers also liked (11)

PPS
各顯神通
bigblue
 
PDF
Marketing Digital e Redes Sociais
Marcio Okabe
 
PDF
5 самых вкусных способов заработка в Youtube - Заработок в сети без вложений
Лайфхак - Вебинары
 
PPTX
The Best of CES 2014
The Tech Cult
 
PDF
Social Media for Bremer Bank
Ann Walker Smalley
 
PDF
Agile Financial Times May09 Edition
Agile Financial Technologies
 
PPS
Logroño
Begoña Garcia Diez
 
PPS
Originales gatos- By Oxana Zaika
maditabalnco
 
PDF
BoldPM Insights Summary: Why Smart, Connected Devices Are Transforming Busine...
Hector Del Castillo, CPM, CPMM
 
PPTX
Estrategias de la publicidad y la mercadotecnia.
Miguel I. Robles Rico
 
PPTX
Cuestionario de comercio
shaniGarciaR
 
各顯神通
bigblue
 
Marketing Digital e Redes Sociais
Marcio Okabe
 
5 самых вкусных способов заработка в Youtube - Заработок в сети без вложений
Лайфхак - Вебинары
 
The Best of CES 2014
The Tech Cult
 
Social Media for Bremer Bank
Ann Walker Smalley
 
Agile Financial Times May09 Edition
Agile Financial Technologies
 
Originales gatos- By Oxana Zaika
maditabalnco
 
BoldPM Insights Summary: Why Smart, Connected Devices Are Transforming Busine...
Hector Del Castillo, CPM, CPMM
 
Estrategias de la publicidad y la mercadotecnia.
Miguel I. Robles Rico
 
Cuestionario de comercio
shaniGarciaR
 

Similar to Combining Ontology Matchers via Anomaly Detection (20)

PPTX
Nicola Pagni - Anomaly Detection in Elasticsearch
MeetupDataScienceRoma
 
PDF
An Efficient Approach for Outlier Detection in Wireless Sensor Network
IOSR Journals
 
PPTX
Less is More: Building Selective Anomaly Ensembles with Application to Event...
Shebuti Rayana
 
PDF
BSSML17 - Anomaly Detection
BigML, Inc
 
PPTX
Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...
yieldWerx Semiconductor
 
PPTX
Anomaly Detection Technique
Chakrit Phain
 
PDF
Analytics for large-scale time series and event data
Anodot
 
PPTX
Time Series Anomaly Detection for .net and Azure
Marco Parenzan
 
PDF
Adaptive and online one class support vector machine-based outlier detection
Nguyen Duong
 
PDF
anomalydetection-191104083630.pdf
hanadi40
 
PPTX
Anomaly detection in plain static graphs
dash-javad
 
PDF
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
ijtsrd
 
PPTX
Anomaly detection
Dr. Stylianos Kampakis
 
PDF
03 presentation-bothiesson
InfinIT - Innovationsnetværket for it
 
PDF
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
1crore projects
 
PDF
BigML Education - Anomaly Detection
BigML, Inc
 
PPTX
Time Series Anomaly Detection with .net and Azure
Marco Parenzan
 
PDF
2007.02500.pdf
TadiyosHailemichael
 
PDF
An Introduction to Anomaly Detection
Kenneth Graham
 
PDF
Outlier Detection Using Unsupervised Learning on High Dimensional Data
IJERA Editor
 
Nicola Pagni - Anomaly Detection in Elasticsearch
MeetupDataScienceRoma
 
An Efficient Approach for Outlier Detection in Wireless Sensor Network
IOSR Journals
 
Less is More: Building Selective Anomaly Ensembles with Application to Event...
Shebuti Rayana
 
BSSML17 - Anomaly Detection
BigML, Inc
 
Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...
yieldWerx Semiconductor
 
Anomaly Detection Technique
Chakrit Phain
 
Analytics for large-scale time series and event data
Anodot
 
Time Series Anomaly Detection for .net and Azure
Marco Parenzan
 
Adaptive and online one class support vector machine-based outlier detection
Nguyen Duong
 
anomalydetection-191104083630.pdf
hanadi40
 
Anomaly detection in plain static graphs
dash-javad
 
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
ijtsrd
 
Anomaly detection
Dr. Stylianos Kampakis
 
03 presentation-bothiesson
InfinIT - Innovationsnetværket for it
 
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
1crore projects
 
BigML Education - Anomaly Detection
BigML, Inc
 
Time Series Anomaly Detection with .net and Azure
Marco Parenzan
 
2007.02500.pdf
TadiyosHailemichael
 
An Introduction to Anomaly Detection
Kenneth Graham
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
IJERA Editor
 
Ad

More from Heiko Paulheim (20)

PDF
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Heiko Paulheim
 
PDF
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim
 
PDF
New Adventures in RDF2vec
Heiko Paulheim
 
PDF
New Adventures in RDF2vec
Heiko Paulheim
 
PDF
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Heiko Paulheim
 
PDF
From Wikis to Knowledge Graphs
Heiko Paulheim
 
PDF
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Heiko Paulheim
 
PPT
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Heiko Paulheim
 
PPT
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Heiko Paulheim
 
ODP
Machine Learning & Embeddings for Large Knowledge Graphs
Heiko Paulheim
 
ODP
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
Heiko Paulheim
 
ODP
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Heiko Paulheim
 
ODP
Make Embeddings Semantic Again!
Heiko Paulheim
 
ODP
How much is a Triple?
Heiko Paulheim
 
ODP
Machine Learning with and for Semantic Web Knowledge Graphs
Heiko Paulheim
 
ODP
Weakly Supervised Learning for Fake News Detection on Twitter
Heiko Paulheim
 
PDF
Towards Knowledge Graph Profiling
Heiko Paulheim
 
ODP
Knowledge Graphs on the Web
Heiko Paulheim
 
ODP
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Heiko Paulheim
 
PPT
Gathering Alternative Surface Forms for DBpedia Entities
Heiko Paulheim
 
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Heiko Paulheim
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim
 
New Adventures in RDF2vec
Heiko Paulheim
 
New Adventures in RDF2vec
Heiko Paulheim
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Heiko Paulheim
 
From Wikis to Knowledge Graphs
Heiko Paulheim
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Heiko Paulheim
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Heiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Heiko Paulheim
 
Machine Learning & Embeddings for Large Knowledge Graphs
Heiko Paulheim
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
Heiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Heiko Paulheim
 
Make Embeddings Semantic Again!
Heiko Paulheim
 
How much is a Triple?
Heiko Paulheim
 
Machine Learning with and for Semantic Web Knowledge Graphs
Heiko Paulheim
 
Weakly Supervised Learning for Fake News Detection on Twitter
Heiko Paulheim
 
Towards Knowledge Graph Profiling
Heiko Paulheim
 
Knowledge Graphs on the Web
Heiko Paulheim
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Heiko Paulheim
 
Gathering Alternative Surface Forms for DBpedia Entities
Heiko Paulheim
 
Ad

Recently uploaded (20)

PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPT
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
PDF
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
Customer Segmentation: Seeing the Trees and the Forest Simultaneously
Sione Palu
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PDF
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PPTX
Nursing Shift Supervisor 24/7 in a week .pptx
amjadtanveer
 
PPTX
Solution+Architecture+Review+-+Sample.pptx
manuvratsingh1
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PDF
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
Customer Segmentation: Seeing the Trees and the Forest Simultaneously
Sione Palu
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Nursing Shift Supervisor 24/7 in a week .pptx
amjadtanveer
 
Solution+Architecture+Review+-+Sample.pptx
manuvratsingh1
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
 

Combining Ontology Matchers via Anomaly Detection

  • 1. Combining Ontology Matchers via Anomaly Detection Alexander C. Müller and Heiko Paulheim
  • 2. 10/13/15 Alexander C. Müller, Heiko Paulheim 2 Motivation • Most high-performing matching systems use multiple matchers • How to combine multiple matchers into a single result? • Common approaches (selection of) – average, maximum, minimum matching score – voting – expert modeled weights (0.4m1 + 0.3m2 + 0.3m3) – supervised learning • Proposal: – use anomaly detection as an unsupervised aggregation method
  • 3. 10/13/15 Alexander C. Müller, Heiko Paulheim 3 Idea • Common definitions anomaly/outlier detection: – Outlier or anomaly detection methods are used to “that appear to deviate markedly from other members of the same sample", i.e. – “that appear to be inconsistent with the remainder of the data" • Rationale: – for two ontologies with n and m concepts, there are nxm candidates – the majority are non-matches – the actual matches are a minority (that differ markedly from the rest) – so, we should be able to identify them as outliers
  • 4. 10/13/15 Alexander C. Müller, Heiko Paulheim 4 Outlier Detection in a Nutshell • Given a set of instances as feature vectors – outlier detection assigns an outlier score to each instance – higher outlier scores ↔ higher degree of outlierness • Common approaches – distance based – density based – clustering based – model based
  • 5. 10/13/15 Alexander C. Müller, Heiko Paulheim 5 Aggregating Matchers via Anomaly Detection • We run a set of base matchers • Each base matcher score becomes a numerical feature • Thus, out feature vectors consist of individual matching scores
  • 6. 10/13/15 Alexander C. Müller, Heiko Paulheim 6 Aggregating Matchers via Anomaly Detection • Example from the conference dataset – note: reduced to two dimensions!
  • 7. 10/13/15 Alexander C. Müller, Heiko Paulheim 7 COMMAND: Full Pipeline • Run set of element-based matchers – find non-correlated subset • Run set of structure-based matchers on that subset • Collect all results into feature vectors • Perform dimensionality reduction – removing correlated matchers – Principal Component Analysis • Run outlier detection • Perform optional repair step
  • 8. 10/13/15 Alexander C. Müller, Heiko Paulheim 8 COMMAND: Full Pipeline
  • 9. 10/13/15 Alexander C. Müller, Heiko Paulheim 9 COMMAND: Full Pipeline • Run set of element-based matchers (28 different ones) – find non-correlated subset • Run set of structure-based matchers (five different ones) on that subset – Collect all results into feature vectors • Perform dimensionality reduction – removing correlated matchers – Principal Component Analysis • Run outlier detection • Normalize outlier scores • Select mapping candidates • Perform optional repair setp
  • 10. 10/13/15 Alexander C. Müller, Heiko Paulheim 10 COMMAND: Results • Good results on biblio benchmark dataset – up to 67% F-measure • Median results on conference – up to 68% F-measure • Difficulties on anatomy dataset – only a subset of matchers could be run for scalability reasons
  • 11. 10/13/15 Alexander C. Müller, Heiko Paulheim 11 Discussion and Conclusion • Proof of Concept – Anomaly detection is suitable for matcher aggregation – non-trivial combination of matcher scores (PCA, outlier score) – automatic selection of a suitable subset of matchers • Future work – address scalability issues – try more anomaly detection approaches
  • 12. Combining Ontology Matchers via Anomaly Detection Alexander C. Müller and Heiko Paulheim