Machine Learning & Embeddings for Large Knowledge Graphs

7/2/19 Heiko Paulheim 1
Machine Learning & Embeddings
for Large Knowledge Graphs
Heiko Paulheim

Crossing the Bridge from the Other Side

Crossing the Bridge from the Other Side
• There are plenty of established ML and DM toolkits...
– Weka
– RapidMiner
– scikit-learn
– R
• ...implementing all your favorite algorithms...
– Naive Bayes
– Random Forests
– SVMs
– (Deep) Neural Networks
– ...
• ...but they all work on feature vectors, not graphs!

Typical Tasks
• Knowledge Graph Internal
– Type prediction
– Link prediction
– Link validation
• Knowledge Graph External
– i.e., using the KG as background knowledge in some other task
– e.g., content-based recommender systems
– e.g., predictive modeling
●
who is the next nobel prize winner?
Gao et al.: Link Prediction Methods and Their Accuracy for Different Social Networks and Network Metrics.
Scientific Programming, 2014
Xu et al.: Explainable Reasoning over Knowledge Graphs for Recommendation. ebay tech blog, 2019

Example: Knowledge Graph Internal
• Type prediction
– Many instances in KGs are not typed or have very abstract types
– e.g., many actors are just typed as persons
• Classic approach
– Exploit ontology
– Shown to be rather sensitive to noise
• Example: ontology-based typing of Germany in DBpedia
– Airport, Award, Building, City, Country, Ethnic Group, Genre,
Language, Military Conflict, Mountain, Mountain Range, Person
Function, Place, Populated Place, Race, Route of Transportation,
Settlement, Stadium, Wine Region
Paulheim & Bizer: Type Inference on Noisy RDF Data. ISWC, 2013
Melo et al.: Type Prediction in Noisy RDF Knowledge Bases using Hierarchical Multilabel Classification with
Graph and Latent Features. IJAIT, 2017

Example: Knowledge Graph Internal
• Alternative: learn model for type prediction
– Train classifier to predict types (binary or hierarchical)
– More noise tolerant
Paulheim & Bizer: Improving the quality of linked data using statistical distributions. IJSWIS, 2014

Example: Knowledge Graph External
• Example machine learning task: predicting book sales
ISBN City Sold
3-2347-3427-1 Darmstadt 124
3-43784-324-2 Mannheim 493
3-145-34587-0 Roßdorf 14
...
ISBN City Population ... Genre Publisher ... Sold
3-2347-3427-1 Darm-
stadt
144402 ... Crime Bloody
Books
... 124
3-43784-324-2 Mann-
heim
291458 … Crime Guns Ltd. … 493
3-145-34587-0 Roß-
dorf
12019 ... Travel Up&Away ... 14
...
→ Crime novels sell better in larger cities
Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data.
WIMS, 2012

Example: The FeGeLOD Framework
IS B N
3 -2 3 4 7 -3 4 2 7 -1
C ity
D a r m s ta d t
# s o ld
1 2 4
N a m e d E n t it y
R e c o g n it io n
IS B N
3 -2 3 4 7 -3 4 2 7 - 1
C ity
D a r m s ta d t
# s o ld
1 2 4
C ity _ U R I
h ttp : / / d b p e d ia .o r g / r e s o u r c e/ D a r m s ta d t
F e a t u r e
G e n e r a t io n
IS B N
3 - 2 3 4 7 -3 4 2 7 -1
C ity
D a r m s ta d t
# s o ld
1 2 4
C ity _ U R I
h ttp : / / d b p e d ia .o r g / r e s o u r c e / D a r m s ta d t
C ity _ U R I_ d b p e d ia -o w l: p o p u la tio n T o ta l
1 4 1 4 7 1
C ity _ U R I_ ...
...
F e a t u r e
S e le c t io n
IS B N
3 -2 3 4 7 -3 4 2 7 - 1
C ity
D a r m s ta d t
# s o ld
1 2 4
C ity _ U R I
h ttp : / / d b p e d ia .o r g / r e s o u r c e/ D a r m s ta d t
C ity _ U R I_ d b p e d ia -o w l:p o p u la tio n T o ta l
1 4 1 4 7 1
WIMS, 2012

The FeGeLOD Framework
• Entity Recognition
– Simple approach: guess DBpedia URIs
– Hit rate >95% for cities and countries (by English name)
• Feature Generation
– augmenting the dataset with additional attributes from KG
• Feature Selection
– Filter noise: >95% unknown, identical, or different nominals
WIMS, 2012

Propositionalization
• Bridge Problem: Knowledge Graphs
vs. ML algorithms expecting Feature Vectors
→ wanted: a transformation from nodes to sets of features
?
Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked
Open Data. LD4KD, 2014

Propositionalization
• Bridge Problem: Knowledge Graphs
vs. ML algorithms expecting Feature Vectors
→ wanted: a transformation from nodes to sets of features
• Basic strategies:
– literal values (e.g., population) are used directly
– instance types become binary features
– relations are counted (absolute, relative, TF-IDF)
– combinations of relations and object types are counted
(absolute, relative, TF-IDF)
– ...

Propositionalization ctd.
• Observations
– Even simple features (e.g., add all numbers and types)
can help on many problems
– More sophisticated features often bring additional improvements
●
Combinations of relations and individuals
– e.g., movies directed by Steven Spielberg
●
Combinations of relations and types
– e.g., movies directed by Oscar-winning directors
●
…
– But
●
The search space is enormous!
●
Generate first, filter later does not scale well

From Naive Propositionalization to
Knowledge Graph Embeddings
• Reconsidering the previous examples:
– We want to predict some attribute of a KG entity
●
e.g., types
●
e.g., sales figures of books
– ...given the entity’s vector representation
• How do we get a “good” vector representation for an entity?
– ...and: what is “good” in the first place?

From Naive Propositionalization to
Knowledge Graph Embeddings
• How do we get a “good” vector representation for an entity?
– ...and: what is “good” in the first place?
• “good” for machine learning means separable
– similar entities are close together
– different entities are further away
https://appliedmachinelearning.blog/2017/03/09/understanding-support-vector-machines-a-primer/

A Brief Excursion to word2vec
• A vector space model for words
• Introduced in 2013
• Each word becomes a vector
– similar words are close
– relations are preserved
– vector arithmetics are possible
https://www.adityathakker.com/introduction-to-word2vec-how-it-works/

• Assumption:
– Similar words appear in similar contexts
{Bush,Obama,Trump} was elected president of the United States
United States president {Bush,Obama,Trump} announced…
…
• Idea
– Train a network that can predict a word from its context (CBOW)
or the context from a word (Skip Gram)
Mikolov et al.: Efficient Estimation of Word Representations in Vector Space. 2013

• Skip Gram: train a neural network with one hidden layer
• Use output values at hidden layer as vector representation
• Observation:
– Bush, Obama, Trump will activate similar context words
– i.e., their output weights at the projection layer have to be similar
Mikolov et al.: Efficient Estimation of Word Representations in Vector Space. 2013

From word2vec to RDF2vec
• Word2vec operates on sentences, i.e., sequences of words
• Idea of RDF2vec
– First extract “sentences” from a graph
– Then train embedding using RDF2vec
• “Sentences” are extracted by performing random graph walks:
Year Zero Nine Inch Nails Trent
Reznor
• Experiments
– RDF2vec can be trained on large KGs (DBpedia, Wikidata)
– 300-500 dimensional vectors outperform
other propositionalization strategies
artist member
Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016

• RDF2vec example
– similar instances form clusters
– direction of relations is stable
Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016

• RecSys example: using proximity in latent RDF2vec feature space
Ristoski et al.: RDF2Vec: RDF Graph Embeddings and their Applications. SWJ 10(4), 2019

Extensions of RDF2vec
• Maybe random walks are not such a good idea
– They may give too much weight on less-known entities and facts
●
Strategies:
– Prefer edges with more frequent predicates
– Prefer nodes with higher indegree
– Prefer nodes with higher PageRank
– …
– They may cover less-known entities and facts too little
●
Strategies:
– The opposite of all of the above strategies
• Bottom line of experimental evaluation:
– Not one strategy fits all
Cochez et al.: Biased Graph Walks for RDF Graph Embeddings. WIMS, 2017

Other Word Embedding Methods
• GloVe (Global Word Embedding Vectors)
• Computes embeddings out of co-occurence statistics
– Using matrix factorization
• Has been applied to random RDF walks as well
• Experimental evaluation:
– In some cases, RDFGloVe outperforms RDF2vec
https://www.kdnuggets.com/2018/04/implementing-deep-learning-methods-feature-engineering-text-data-
glove.html
Cochez et al.: Global RDF Vector Space Embeddings, ISWC, 2017

Other Word Embedding Methods
• There is a lot of promising stuff not yet tried
– e.g., biasing walks based on human factors
– e.g., more recent word embedding methods such as ELMo and BERT
https://www.nbcnews.com/feature/nbc-out/bert-ernie-are-gay-couple-sesame-street-writer-claims-n910701

TransE and its Descendants
• In RDF2vec, relation preservation is a by-product
• TransE: direct modeling
– Formulates RDF embedding as an optimization problem
– Find mapping of entities and relations to Rn
so that
●
across all triples <s,p,o>
Σ ||s+p-o|| is minimized
●
try to obtain a smaller error
for existing triples
than for non-existing ones
Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013.
Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete
Repositories. WI 2016

Limitations of TransE
• Symmetric properties
– we have to minimize
||Barack + spouse – Michelle|| and ||Michelle + spouse – Barack||
simultaneously
– ideally, Barack + spouse = Michelle and Michelle + spouse = Barack
●
Michelle and Barack become infinitely close
●
spouse becomes 0 vector
Michelle
Barack

• Transitive Properties
||Miami + partOf – Florida|| and ||Florida + partOf – USA||, but also
||Miami + partOf – USA||
– ideally, Miami + partOf = Florida, Florida + partOf = USA,
Miami + partOf = USA
●
Again: all three become infinitely close
●
partOf becomes 0 vector
Florida
Miami
USA

• One to many properties
||New York + partOf – USA||, ||Florida + partOf – USA||,
||Ohio + partOf – USA||, …
– ideally, NewYork + partOf = USA, Florida + partOf = USA,
Ohio + partOf = USA
●
all the subjects become infinitely close
Florida
USA
New York
Ohio

• Reflexive properties
||Tom + knows - Tom||
– ideally, Tom + knows = Tom
●
Knows becomes 0 vector
Tom

TransE
RDF2Vec
HolE
DistMult
RESCAL
NTN
TransR
TransH
TransD
KG2E
ComplEx
• Numerous variants of TransE have been proposed
to overcome limitations (e.g., TransH, TransR, TransD, …)
• Plus: embedding approaches based on tensor factorization etc.

Are we Driving on the Wrong Side of the Road?

Are we Driving on the Wrong Side of the Road?
• Original ideas:
– Assign meaning to data
– Allow for machine inference
– Explain inference results to the user
Berners-Lee et al: The Semantic Web. Scientific American, May 2001

Running Example: Recommender Systems
• Content based recommender systems backed by Semantic Web
data
– (today: knowledge graphs)
• Advantages
– use rich background information about recommended items (for free)
– justifications can be generated (e.g., you like movies by that director)
https://lazyprogrammer.me/tutorial-on-collaborative-filtering-and-matrix-factorization-in-python/

The 2009 Semantic Web Layer Cake

The 2019 Semantic Web Layer Cake
Embeddings

Towards Semantic Vector Space Embeddings
cartoon
superhero
Ristoski et al.: RDF2Vec: RDF Graph Embeddings and their Applications. SWJ 10(4), 2019

The Holy Grail
• Combine semantics and embeddings
– e.g., directly create meaningful dimensions
– e.g., learn interpretation of dimensions a posteriori
– ...

A New Design Space
quantitative
performance
semantic
interpretability

Software to Check Out
• http://openke.thunlp.org/
– Implements many embedding approaches
– Pre-trained vectors available, e.g., for Wikidata

Software to Check Out
• Loading RDF in Python: https://github.com/RDFLib/rdflib

RapidMiner Linked Open Data Extension
caution: works
only until RM6! :-(

References (1)
• Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific american, 284(5),
28-37.
• Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating
embeddings for modeling multi-relational data. In NIPS (pp. 2787-2795).
• Cochez, M., Ristoski, P., Ponzetto, S. P., & Paulheim, H. (2017). Biased graph walks for RDF
graph embeddings. In WIMS (p. 21). ACM.
• Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805.
• Melo, A., Völker, J., & Paulheim, H. (2017). Type prediction in noisy RDF knowledge bases using
hierarchical multilabel classification with graph and latent features. IJAIT, 26(02).
• Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word
representations in vector space. arXiv preprint arXiv:1301.3781.
• Paulheim, H., & Fümkranz, J. (2012). Unsupervised generation of data mining features from
linked open data. In WIMS (p. 31). ACM.
• Paulheim, H., & Bizer, C. (2013). Type inference on noisy RDF data. In International semantic
web conference (pp. 510-525). Springer, Berlin, Heidelberg.

References (2)
• Paulheim, H., & Bizer, C. (2014). Improving the quality of linked data using statistical
distributions. IJSWIS, 10(2), 63-86.
• Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation
methods. Semantic web, 8(3), 489-508.
• Paulheim, H. (2018). Make Embeddings Semantic Again! ISWC (Blue Sky Track)
• Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018).
Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
• Ristoski, P., & Paulheim, H. (2014). A comparison of propositionalization strategies for creating
features from linked open data. Linked Data for Knowledge Discovery, 6.
• Ristoski, P., Bizer, C., & Paulheim, H. (2015). Mining the web of linked data with rapidminer. Web
Semantics: Science, Services and Agents on the World Wide Web, 35, 142-151.
• Ristoski, P., & Paulheim, H. (2016). Semantic Web in data mining and knowledge discovery: A
comprehensive survey. Web semantics, 36, 1-22.
• Ristoski, P., & Paulheim, H. (2016). RDF2vec: RDF graph embeddings for data mining. In
International Semantic Web Conference (pp. 498-514). Springer, Cham.
• Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., & Paulheim, H. (2019). RDF2Vec: RDF graph
embeddings and their applications. Semantic Web, 10(4), 1-32.

Machine Learning & Embeddings
for Large Knowledge Graphs
Heiko Paulheim

Machine Learning & Embeddings for Large Knowledge Graphs

More Related Content

What's hot (20)

Similar to Machine Learning & Embeddings for Large Knowledge Graphs (20)

More from Heiko Paulheim (20)

Recently uploaded (20)

Machine Learning & Embeddings for Large Knowledge Graphs