Combining Ontology Matchers via Anomaly Detection

Download as ODP, PDF

•0 likes•638 views

The document explores the concept of combining multiple ontology matchers using anomaly detection as an unsupervised method for aggregation. It discusses the rationale behind identifying outliers among matching scores, highlights a full pipeline approach that includes running different matchers, dimensionality reduction, and outlier detection, and presents performance results on various datasets. The authors conclude that anomaly detection is effective for matcher aggregation while also noting future work needed to address scalability issues.

Data & Analytics

Combining Ontology Matchers
via Anomaly Detection
Alexander C. Müller and Heiko Paulheim

10/13/15 Alexander C. Müller, Heiko Paulheim 2
Motivation
• Most high-performing matching systems use multiple matchers
• How to combine multiple matchers into a single result?
• Common approaches (selection of)
– average, maximum, minimum matching score
– voting
– expert modeled weights (0.4m1 + 0.3m2 + 0.3m3)
– supervised learning
• Proposal:
– use anomaly detection as an unsupervised aggregation method

10/13/15 Alexander C. Müller, Heiko Paulheim 3
Idea
• Common definitions anomaly/outlier detection:
– Outlier or anomaly detection methods are used to “that appear to
deviate markedly from other members of the same sample", i.e.
– “that appear to be inconsistent with the remainder of the data"
• Rationale:
– for two ontologies with n and m concepts, there are nxm candidates
– the majority are non-matches
– the actual matches are a minority (that differ markedly from the rest)
– so, we should be able to identify them as outliers

10/13/15 Alexander C. Müller, Heiko Paulheim 4
Outlier Detection in a Nutshell
• Given a set of instances as feature vectors
– outlier detection assigns an outlier score to each instance
– higher outlier scores ↔ higher degree of outlierness
• Common approaches
– distance based
– density based
– clustering based
– model based

10/13/15 Alexander C. Müller, Heiko Paulheim 5
Aggregating Matchers via Anomaly Detection
• We run a set of base matchers
• Each base matcher score becomes a numerical feature
• Thus, out feature vectors consist of individual matching scores

10/13/15 Alexander C. Müller, Heiko Paulheim 6
Aggregating Matchers via Anomaly Detection
• Example from the conference dataset
– note: reduced to two dimensions!

10/13/15 Alexander C. Müller, Heiko Paulheim 7
COMMAND: Full Pipeline
• Run set of element-based matchers
– find non-correlated subset
• Run set of structure-based matchers on that subset
• Collect all results into feature vectors
• Perform dimensionality reduction
– removing correlated matchers
– Principal Component Analysis
• Run outlier detection
• Perform optional repair step

10/13/15 Alexander C. Müller, Heiko Paulheim 8
COMMAND: Full Pipeline

10/13/15 Alexander C. Müller, Heiko Paulheim 9
COMMAND: Full Pipeline
• Run set of element-based matchers (28 different ones)
– find non-correlated subset
• Run set of structure-based matchers (five different ones)
on that subset
– Collect all results into feature vectors
• Perform dimensionality reduction
– removing correlated matchers
– Principal Component Analysis
• Run outlier detection
• Normalize outlier scores
• Select mapping candidates
• Perform optional repair setp

10/13/15 Alexander C. Müller, Heiko Paulheim 10
COMMAND: Results
• Good results on biblio benchmark dataset
– up to 67% F-measure
• Median results on conference
– up to 68% F-measure
• Difficulties on anatomy dataset
– only a subset of matchers could be run for scalability reasons

10/13/15 Alexander C. Müller, Heiko Paulheim 11
Discussion and Conclusion
• Proof of Concept
– Anomaly detection is suitable
for matcher aggregation
– non-trivial combination of
matcher scores (PCA, outlier score)
– automatic selection of a suitable
subset of matchers
• Future work
– address scalability issues
– try more anomaly detection
approaches

More Related Content

Viewers also liked (11)

PPS

各顯神通bigblue

PDF

Marketing Digital e Redes SociaisMarcio Okabe

PDF

5 самых вкусных способов заработка в Youtube - Заработок в сети без вложений Лайфхак - Вебинары

PPTX

The Best of CES 2014The Tech Cult

PDF

Social Media for Bremer BankAnn Walker Smalley

PDF

Agile Financial Times May09 EditionAgile Financial Technologies

PPS

LogroñoBegoña Garcia Diez

PPS

Originales gatos- By Oxana Zaikamaditabalnco

PDF

BoldPM Insights Summary: Why Smart, Connected Devices Are Transforming Busine...Hector Del Castillo, CPM, CPMM

PPTX

Estrategias de la publicidad y la mercadotecnia.Miguel I. Robles Rico

PPTX

Cuestionario de comercioshaniGarciaR

各顯神通bigblue

Marketing Digital e Redes SociaisMarcio Okabe

5 самых вкусных способов заработка в Youtube - Заработок в сети без вложений Лайфхак - Вебинары

The Best of CES 2014The Tech Cult

Social Media for Bremer BankAnn Walker Smalley

Agile Financial Times May09 EditionAgile Financial Technologies

LogroñoBegoña Garcia Diez

Originales gatos- By Oxana Zaikamaditabalnco

BoldPM Insights Summary: Why Smart, Connected Devices Are Transforming Busine...Hector Del Castillo, CPM, CPMM

Estrategias de la publicidad y la mercadotecnia.Miguel I. Robles Rico

Cuestionario de comercioshaniGarciaR

Similar to Combining Ontology Matchers via Anomaly Detection (20)

PPTX

Nicola Pagni - Anomaly Detection in ElasticsearchMeetupDataScienceRoma

PDF

An Efficient Approach for Outlier Detection in Wireless Sensor NetworkIOSR Journals

PPTX

Less is More: Building Selective Anomaly Ensembles with Application to Event...Shebuti Rayana

PDF

BSSML17 - Anomaly DetectionBigML, Inc

PPTX

Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...yieldWerx Semiconductor

PPTX

Anomaly Detection TechniqueChakrit Phain

PDF

Analytics for large-scale time series and event dataAnodot

PPTX

Time Series Anomaly Detection for .net and AzureMarco Parenzan

PDF

Adaptive and online one class support vector machine-based outlier detectionNguyen Duong

PDF

anomalydetection-191104083630.pdfhanadi40

PPTX

Anomaly detection in plain static graphsdash-javad

PDF

Outlier Detection using Reverse Neares Neighbor for Unsupervised Dataijtsrd

PPTX

Anomaly detectionDr. Stylianos Kampakis

PDF

03 presentation-bothiessonInfinIT - Innovationsnetværket for it

PDF

Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection1crore projects

PDF

BigML Education - Anomaly DetectionBigML, Inc

PPTX

Time Series Anomaly Detection with .net and AzureMarco Parenzan

PDF

2007.02500.pdfTadiyosHailemichael

PDF

An Introduction to Anomaly DetectionKenneth Graham

PDF

Outlier Detection Using Unsupervised Learning on High Dimensional DataIJERA Editor

Nicola Pagni - Anomaly Detection in ElasticsearchMeetupDataScienceRoma

An Efficient Approach for Outlier Detection in Wireless Sensor NetworkIOSR Journals

Less is More: Building Selective Anomaly Ensembles with Application to Event...Shebuti Rayana

BSSML17 - Anomaly DetectionBigML, Inc

Outlier Detection in Data Mining An Essential Component of Semiconductor Manu...yieldWerx Semiconductor

Anomaly Detection TechniqueChakrit Phain

Analytics for large-scale time series and event dataAnodot

Time Series Anomaly Detection for .net and AzureMarco Parenzan

Adaptive and online one class support vector machine-based outlier detectionNguyen Duong

anomalydetection-191104083630.pdfhanadi40

Anomaly detection in plain static graphsdash-javad

Outlier Detection using Reverse Neares Neighbor for Unsupervised Dataijtsrd

Anomaly detectionDr. Stylianos Kampakis

03 presentation-bothiessonInfinIT - Innovationsnetværket for it

Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection1crore projects

BigML Education - Anomaly DetectionBigML, Inc

Time Series Anomaly Detection with .net and AzureMarco Parenzan

2007.02500.pdfTadiyosHailemichael

An Introduction to Anomaly DetectionKenneth Graham

Outlier Detection Using Unsupervised Learning on High Dimensional DataIJERA Editor

More from Heiko Paulheim (20)

PDF

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim

PDF

What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim

PDF

New Adventures in RDF2vecHeiko Paulheim

PDF

New Adventures in RDF2vecHeiko Paulheim

PDF

Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim

PDF

From Wikis to Knowledge GraphsHeiko Paulheim

PDF

Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim

PPT

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph BlockHeiko Paulheim

PPT

Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Heiko Paulheim

ODP

Machine Learning & Embeddings for Large Knowledge GraphsHeiko Paulheim

ODP

From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim

ODP

Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Heiko Paulheim

ODP

Make Embeddings Semantic Again!Heiko Paulheim

ODP

How much is a Triple?Heiko Paulheim

ODP

Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim

ODP

Weakly Supervised Learning for Fake News Detection on TwitterHeiko Paulheim

PDF

Towards Knowledge Graph ProfilingHeiko Paulheim

ODP

Knowledge Graphs on the WebHeiko Paulheim

ODP

Data-driven Joint Debugging of the DBpedia Mappings and OntologyHeiko Paulheim

PPT

Gathering Alternative Surface Forms for DBpedia EntitiesHeiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim

What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim

New Adventures in RDF2vecHeiko Paulheim

Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim

From Wikis to Knowledge GraphsHeiko Paulheim

Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph BlockHeiko Paulheim

Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Heiko Paulheim

Machine Learning & Embeddings for Large Knowledge GraphsHeiko Paulheim

From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim

Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Heiko Paulheim

Make Embeddings Semantic Again!Heiko Paulheim

How much is a Triple?Heiko Paulheim

Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim

Weakly Supervised Learning for Fake News Detection on TwitterHeiko Paulheim

Towards Knowledge Graph ProfilingHeiko Paulheim

Knowledge Graphs on the WebHeiko Paulheim

Data-driven Joint Debugging of the DBpedia Mappings and OntologyHeiko Paulheim

Gathering Alternative Surface Forms for DBpedia EntitiesHeiko Paulheim

Recently uploaded (20)

PDF

WISE main accomplishments for ISQOLS award July 2025.pdfStatsCommunications

PPT

introdution to python with a very little difficultyHUZAIFABINABDULLAH

PDF

apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...apidays

PPTX

lecture 13 mind test academy it skills.pptxggesjmrasoolpark

PPTX

MR and reffffffvvvvvvvfversal_083605.pptxmanjeshjain

PPTX

Introduction-to-Python-Programming-Language (1).pptxdhyeysapariya

PPTX

Customer Segmentation: Seeing the Trees and the Forest SimultaneouslySione Palu

PPTX

Future_of_AI_Presentation for everyone.pptxboranamanju07

PDF

apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...apidays

PPT

Real Life Application of Set theory, Relations and Functionsmanavparmar205

PPTX

World-population.pptx fire bunberbpeopleumutunsalnsl4402

PDF

202501214233242351219 QASS Session 2.pdflauramejiamillan

PDF

Classifcation using Machine Learning and deep learningbhaveshagrawal35

PDF

D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsdminhn6673

PPTX

Nursing Shift Supervisor 24/7 in a week .pptxamjadtanveer

PPTX

Solution+Architecture+Review+-+Sample.pptxmanuvratsingh1

PDF

202501214233242351219 QASS Session 2.pdflauramejiamillan

PPTX

Introduction to computer chapter one 2017.pptxmensunmarley

PPTX

Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...Sione Palu

PDF

apidays Munich 2025 - The Physics of Requirement Sciences Through Application...apidays

WISE main accomplishments for ISQOLS award July 2025.pdfStatsCommunications

introdution to python with a very little difficultyHUZAIFABINABDULLAH

apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...apidays

lecture 13 mind test academy it skills.pptxggesjmrasoolpark

MR and reffffffvvvvvvvfversal_083605.pptxmanjeshjain

Introduction-to-Python-Programming-Language (1).pptxdhyeysapariya

Customer Segmentation: Seeing the Trees and the Forest SimultaneouslySione Palu

Future_of_AI_Presentation for everyone.pptxboranamanju07

apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...apidays

Real Life Application of Set theory, Relations and Functionsmanavparmar205

World-population.pptx fire bunberbpeopleumutunsalnsl4402

202501214233242351219 QASS Session 2.pdflauramejiamillan

Classifcation using Machine Learning and deep learningbhaveshagrawal35

D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsdminhn6673

Nursing Shift Supervisor 24/7 in a week .pptxamjadtanveer

Solution+Architecture+Review+-+Sample.pptxmanuvratsingh1

202501214233242351219 QASS Session 2.pdflauramejiamillan

Introduction to computer chapter one 2017.pptxmensunmarley

Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...Sione Palu

apidays Munich 2025 - The Physics of Requirement Sciences Through Application...apidays

Combining Ontology Matchers via Anomaly Detection

1. Combining Ontology Matchers via Anomaly Detection Alexander C. Müller and Heiko Paulheim

2. 10/13/15 Alexander C. Müller, Heiko Paulheim 2 Motivation • Most high-performing matching systems use multiple matchers • How to combine multiple matchers into a single result? • Common approaches (selection of) – average, maximum, minimum matching score – voting – expert modeled weights (0.4m1 + 0.3m2 + 0.3m3) – supervised learning • Proposal: – use anomaly detection as an unsupervised aggregation method

3. 10/13/15 Alexander C. Müller, Heiko Paulheim 3 Idea • Common definitions anomaly/outlier detection: – Outlier or anomaly detection methods are used to “that appear to deviate markedly from other members of the same sample", i.e. – “that appear to be inconsistent with the remainder of the data" • Rationale: – for two ontologies with n and m concepts, there are nxm candidates – the majority are non-matches – the actual matches are a minority (that differ markedly from the rest) – so, we should be able to identify them as outliers

4. 10/13/15 Alexander C. Müller, Heiko Paulheim 4 Outlier Detection in a Nutshell • Given a set of instances as feature vectors – outlier detection assigns an outlier score to each instance – higher outlier scores ↔ higher degree of outlierness • Common approaches – distance based – density based – clustering based – model based

5. 10/13/15 Alexander C. Müller, Heiko Paulheim 5 Aggregating Matchers via Anomaly Detection • We run a set of base matchers • Each base matcher score becomes a numerical feature • Thus, out feature vectors consist of individual matching scores

6. 10/13/15 Alexander C. Müller, Heiko Paulheim 6 Aggregating Matchers via Anomaly Detection • Example from the conference dataset – note: reduced to two dimensions!

7. 10/13/15 Alexander C. Müller, Heiko Paulheim 7 COMMAND: Full Pipeline • Run set of element-based matchers – find non-correlated subset • Run set of structure-based matchers on that subset • Collect all results into feature vectors • Perform dimensionality reduction – removing correlated matchers – Principal Component Analysis • Run outlier detection • Perform optional repair step

8. 10/13/15 Alexander C. Müller, Heiko Paulheim 8 COMMAND: Full Pipeline

9. 10/13/15 Alexander C. Müller, Heiko Paulheim 9 COMMAND: Full Pipeline • Run set of element-based matchers (28 different ones) – find non-correlated subset • Run set of structure-based matchers (five different ones) on that subset – Collect all results into feature vectors • Perform dimensionality reduction – removing correlated matchers – Principal Component Analysis • Run outlier detection • Normalize outlier scores • Select mapping candidates • Perform optional repair setp

10. 10/13/15 Alexander C. Müller, Heiko Paulheim 10 COMMAND: Results • Good results on biblio benchmark dataset – up to 67% F-measure • Median results on conference – up to 68% F-measure • Difficulties on anatomy dataset – only a subset of matchers could be run for scalability reasons

11. 10/13/15 Alexander C. Müller, Heiko Paulheim 11 Discussion and Conclusion • Proof of Concept – Anomaly detection is suitable for matcher aggregation – non-trivial combination of matcher scores (PCA, outlier score) – automatic selection of a suitable subset of matchers • Future work – address scalability issues – try more anomaly detection approaches

12. Combining Ontology Matchers via Anomaly Detection Alexander C. Müller and Heiko Paulheim