SlideShare a Scribd company logo
2 3 M A R C H 2 0 1 8 , L E I D E N U N I V E RS I T Y
WORD2VEC TUTORIAL
SUZAN VERBERNE
CONTENTS
 Introduction
 Background
 What is word2vec?
 What can you do with it?
 Practical session
Suzan Verberne
INTRODUCTORY ROUND
Suzan Verberne
ABOUT ME
 Master (2002): Natural Language Processing (NLP)
 PhD (2010): NLP and Information Retrieval (IR)
 Now: Assistant Professor at the Leiden Institute for Advanced
Computer Science (LIACS), affiliated with the Data Science research
programme.
 Research: projects on text mining and information retrieval in
various domains
 Teaching: Data Science & Text Mining
Suzan Verberne
Suzan Verberne
BACKGROUND
WHERE TO START
 Linguistics: Distributional hypothesis
 Data science: Vector Space Model (VSM)
Suzan Verberne
DISTRIBUTIONAL HYPOTHESIS
 Harris, Z. (1954). “Distributional structure”. Word. 10 (23): 146–162.
 “Words that occur in similar contexts tend to be similar”
Suzan Verberne
VECTOR SPACE MODEL
 Traditional Vector Space Model (Information Retrieval):
 documents and queries represented in a vector space
 where the dimensions are the words
Suzan Verberne
VECTOR SPACE MODEL
Suzan Verberne
VECTOR SPACE MODEL
Suzan Verberne
VECTOR SPACE MODEL
Suzan Verberne
VECTOR SPACE MODEL
 Vector Space Models represent (embed) words or documents in a
continuous vector space
 The vector space is relatively low-dimensional (100 – 320
dimensions instead of 10,000s)
 Semantically and syntactically similar words are mapped to nearby
points (Distributional Hypothesis)
Suzan Verberne
Suzan Verberne
PCA projection of 320-
dimensional vector space
Suzan Verberne
WORD2VEC
WHAT IS WORD2VEC?
 Word2vec is a particularly computationally-efficient predictive
model for learning word embeddings from raw text
 So that similar words are mapped to nearby points
Suzan Verberne
EXAMPLE: PATIENT FORUM MINER
Suzan Verberne
WHERE DOES IT COME FROM
 Neural network language model (NNLM) (Bengio et al., 2003 – not
discussed today)
 Mikolov previously proposed (MSc thesis, PhD thesis, other papers)
to first learn word vectors “using neural network with a single
hidden layer” and then train the NNLM independently
 Word2vec directly extends this work => “word vectors learned
using a simple model”
 Many architectures and models have been proposed for computing
word vectors (e.g. see Socher’s Stanford group work which resulted
in GloVe - http://nlp.stanford.edu/projects/glove/)
Suzan Verberne
WHAT IS WORD2VEC (NOT)?
 It is:
 Neural network-based
 Word embeddings
 VSM-based
 Co-occurrence-based
 It is not:
 Deep learning
Suzan Verberne
Word embedding is the collective name for a
set of language modeling and feature learning
techniques in natural language processing
(NLP) where words or phrases from the
vocabulary are mapped to vectors of real
numbers in a low-dimensional space relative to
the vocabulary size
WORDS IN CONTEXT
Suzan Verberne
HOW IS THE MODEL TRAINED?
 Move through the training corpus with a sliding window. Each word
is a prediction problem:
 the objective is to predict the current word with the help of its
contexts (or vice versa)
 The outcome of the prediction determines whether we adjust the
current word vector. Gradually, vectors converge to (hopefully)
optimal values
It is important that prediction here is not an aim in itself: it is just a
proxy to learn vector representations good for other tasks
Suzan Verberne
OTHER VECTOR SPACE SEMANTICS TECHNIQUES
 Some “classic” NLP for estimating continuous representations of
words
 – LSA (Latent Semantic Analysis)
 – LDA (Latent Dirichlet Allocation)
 Distributed representations of words learned by neural networks
outperform LSA on various tasks
 LDA is computationally expensive and cannot be trained on very
large datasets
Suzan Verberne
ADVANTAGES OF WORD2VEC
 It scales
 Train on billion word corpora
 In limited time
 Possibility of parallel training
 Pre-trained word embeddings trained by one can be used by others
 For entirely different tasks
 Incremental training
 Train on one piece of data, save results, continue training later on
 There is a Python module for it:
 Gensim word2vec
Suzan Verberne
Suzan Verberne
WHAT CAN YOU DO WITH IT?
DO IT YOURSELF
 Implementation in Python package gensim
import gensim
model = gensim.models.Word2Vec(sentences, size=100,
window=5, min_count=5,
workers=4)
size: the dimensionality of the feature vectors (common: 200 or 320)
window: the maximum distance between the current and predicted word
within a sentence
min_count: minimum number of occurrences of a word in the corpus to be
included in the model
workers: for parallellization with multicore machines
Suzan Verberne
DO IT YOURSELF
model.most_similar(‘apple’)
 [(’banana’, 0.8571481704711914), ...]
model.doesnt_match("breakfast cereal dinner lunch".split())
 ‘cereal’
model.similarity(‘woman’, ‘man’)
 0.73723527
Suzan Verberne
WHAT CAN YOU DO WITH IT?
 A is to B as C is to ?
 This is the famous example:
vector(king) – vector(man) + vector(woman) = vector(queen)
 Actually, what the original paper says is: if you substract the vector
for ‘man’ from the one for ‘king’ and add the vector for ‘woman’,
the vector closest to the one you end up with turns out to be the
one for ‘queen’.
 More interesting:
France is to Paris as Germany is to …
Suzan Verberne
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed
representations of words and phrases and their compositionality. In Advances in
Neural Information Processing Systems, pages 3111–3119, 2013.
WHAT CAN YOU DO WITH IT?
 A is to B as C is to ?
 It also works for syntactic relations:
 vector(biggest) - vector(big) + vector(small) =
Suzan Verberne
vector(smallest)
WHAT CAN YOU DO WITH IT?
 Mining knowledge about natural language
 Improve NLP applications
 Improve Text Mining applications
Suzan Verberne
WHAT CAN YOU DO WITH IT?
 Mining knowledge about natural language
 Selecting out-of-the-list words
 Example: which word does not belong in [monkey, lion, dog, truck]
 Selectional preferences
 Example: predict typical verb-noun pairs: people as subject of eating is more
likely than people as object of eating
 Imrove NLP applications:
 Sentence completion/text prediction
 Bilingual Word Embeddings for Phrase-Based Machine Translation
Suzan Verberne
WHAT CAN YOU DO WITH IT?
 Improve Text Mining applications:
 (Near-)Synonym detection ( query expansion)
 Concept representation of texts
 Example: Twitter sentiment classification
 Document similarity
 Example: cluster news articles per news event
Suzan Verberne
DEAL WITH PHRASES
 Entities as single words:
 during pre-processing: find collocations (e.g. Wikipedia anchor texts),
and feed them as single ‘words’ to the neural network.
 Bigrams (built-in function)
bigram = gensim.models.Phrases(sentence_stream)
term_list = list(bigram[sentence_stream])
model = gensim.models.Word2Vec(term_list)
Suzan Verberne
HOW TO EVALUATE A MODEL
 There is a comprehensive test set that contains five types of
semantic questions, and nine types of syntactic questions
 – 8869 semantic questions – 10675 syntactic questions
 https://rare-technologies.com/word2vec-tutorial/
Suzan Verberne
FURTHER READING
 T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word
representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
 T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed
representations of words and phrases and their compositionality. In
Advances in Neural Information Processing Systems, pages 3111– 3119,
2013.
 http://radimrehurek.com/2014/02/word2vec-tutorial/
 http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-
model/
 https://rare-technologies.com/word2vec-tutorial/
Suzan Verberne
Suzan Verberne
PRACTICAL SESSION
PRE-TRAINED WORD2VEC MODELS
 We will compare two pre-trained word2vec models:
 A model trained on a Dutch web corpus (COW)
 A model trained on Dutch web forum data
 Goal: inspect the differences between the models
Suzan Verberne
THANKS
 Traian Rebedea
 Tom Kenter
 Mohammad Mahdavi
 Satoshi Sekine
Suzan Verberne

More Related Content

What's hot (20)

PDF
GPT-2: Language Models are Unsupervised Multitask Learners
Young Seok Kim
 
PDF
A Review of Deep Contextualized Word Representations (Peters+, 2018)
Shuntaro Yada
 
PDF
Natural Language Processing
Toine Bogers
 
PPTX
Introduction to natural language processing (NLP)
Alia Hamwi
 
PPTX
NLP
guestff64339
 
PPTX
Word embeddings
Shruti kar
 
PDF
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
PPTX
Word embedding
ShivaniChoudhary74
 
PDF
Glove global vectors for word representation
hyunyoung Lee
 
PPTX
Word embeddings, RNN, GRU and LSTM
Divya Gera
 
PPTX
BERT
Khang Pham
 
PDF
Natural language processing (NLP) introduction
Robert Lujo
 
PDF
Neural Architectures for Named Entity Recognition
Rrubaa Panchendrarajan
 
PPTX
1909 BERT: why-and-how (CODE SEMINAR)
WarNik Chow
 
PDF
Word Embeddings, why the hype ?
Hady Elsahar
 
PPTX
Introduction to Named Entity Recognition
Tomer Lieber
 
PPTX
Word2vec slide(lab seminar)
Jinpyo Lee
 
PDF
Nlp ambiguity presentation
Gurram Poorna Prudhvi
 
PPTX
NLP_KASHK:Minimum Edit Distance
Hemantha Kulathilake
 
GPT-2: Language Models are Unsupervised Multitask Learners
Young Seok Kim
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
Shuntaro Yada
 
Natural Language Processing
Toine Bogers
 
Introduction to natural language processing (NLP)
Alia Hamwi
 
Word embeddings
Shruti kar
 
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
Word embedding
ShivaniChoudhary74
 
Glove global vectors for word representation
hyunyoung Lee
 
Word embeddings, RNN, GRU and LSTM
Divya Gera
 
Natural language processing (NLP) introduction
Robert Lujo
 
Neural Architectures for Named Entity Recognition
Rrubaa Panchendrarajan
 
1909 BERT: why-and-how (CODE SEMINAR)
WarNik Chow
 
Word Embeddings, why the hype ?
Hady Elsahar
 
Introduction to Named Entity Recognition
Tomer Lieber
 
Word2vec slide(lab seminar)
Jinpyo Lee
 
Nlp ambiguity presentation
Gurram Poorna Prudhvi
 
NLP_KASHK:Minimum Edit Distance
Hemantha Kulathilake
 

Similar to Tutorial on word2vec (20)

PPTX
Text Mining for Lexicography
Leiden University
 
PPTX
Word2 vec
ankit_ppt
 
PPTX
Word_Embeddings.pptx
GowrySailaja
 
PDF
Word2vec ultimate beginner
Sungmin Yang
 
PPTX
word vector embeddings in natural languag processing
ReetShinde
 
PDF
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Rehgan Avon
 
PDF
Yoav Goldberg: Word Embeddings What, How and Whither
MLReview
 
PPTX
Lecture1.pptx
jonathanG19
 
PPTX
Vector Space Word Representations - Rani Nelken PhD
freshdatabos
 
PPTX
Embedding for fun fumarola Meetup Milano DLI luglio
Deep Learning Italia
 
PPTX
Pycon ke word vectors
Osebe Sammi
 
PPTX
Tomáš Mikolov - Distributed Representations for NLP
Machine Learning Prague
 
PDF
Word2vec: From intuition to practice using gensim
Edgar Marca
 
PDF
Deep learning Malaysia presentation 12/4/2017
Brian Ho
 
PDF
Lda2vec text by the bay 2016 with notes
👋 Christopher Moody
 
PPTX
Designing, Visualizing and Understanding Deep Neural Networks
connectbeubax
 
PPTX
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Bhaskar Mitra
 
PPTX
NLP Introduction and basics of natural language processing
mailtoahmedhassan
 
PDF
word2vec - From theory to practice
hen_drik
 
PPTX
Efficient estimation of word representations in vector space (2013)
Minhazul Arefin
 
Text Mining for Lexicography
Leiden University
 
Word2 vec
ankit_ppt
 
Word_Embeddings.pptx
GowrySailaja
 
Word2vec ultimate beginner
Sungmin Yang
 
word vector embeddings in natural languag processing
ReetShinde
 
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Rehgan Avon
 
Yoav Goldberg: Word Embeddings What, How and Whither
MLReview
 
Lecture1.pptx
jonathanG19
 
Vector Space Word Representations - Rani Nelken PhD
freshdatabos
 
Embedding for fun fumarola Meetup Milano DLI luglio
Deep Learning Italia
 
Pycon ke word vectors
Osebe Sammi
 
Tomáš Mikolov - Distributed Representations for NLP
Machine Learning Prague
 
Word2vec: From intuition to practice using gensim
Edgar Marca
 
Deep learning Malaysia presentation 12/4/2017
Brian Ho
 
Lda2vec text by the bay 2016 with notes
👋 Christopher Moody
 
Designing, Visualizing and Understanding Deep Neural Networks
connectbeubax
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Bhaskar Mitra
 
NLP Introduction and basics of natural language processing
mailtoahmedhassan
 
word2vec - From theory to practice
hen_drik
 
Efficient estimation of word representations in vector space (2013)
Minhazul Arefin
 
Ad

More from Leiden University (13)

PPTX
‘Big models’: the success and pitfalls of Transformer models in natural langu...
Leiden University
 
PPTX
Text mining for health knowledge discovery
Leiden University
 
PPTX
'Het nieuwe zoeken' voor informatieprofessionals
Leiden University
 
PPTX
kanker.nl & Data Science
Leiden University
 
PPTX
Automatische classificatie van teksten
Leiden University
 
PPTX
Computationeel denken
Leiden University
 
PPTX
Summarizing discussion threads
Leiden University
 
PPTX
Automatische classificatie van teksten
Leiden University
 
PPTX
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leiden University
 
PPTX
RemBench: A Digital Workbench for Rembrandt Research
Leiden University
 
PPTX
Collecting a dataset of information behaviour in context
Leiden University
 
PPTX
Search engines for the humanities that go beyond Google
Leiden University
 
PPTX
Krijgen we ooit de beschikking over slimme zoektechnologie?
Leiden University
 
‘Big models’: the success and pitfalls of Transformer models in natural langu...
Leiden University
 
Text mining for health knowledge discovery
Leiden University
 
'Het nieuwe zoeken' voor informatieprofessionals
Leiden University
 
kanker.nl & Data Science
Leiden University
 
Automatische classificatie van teksten
Leiden University
 
Computationeel denken
Leiden University
 
Summarizing discussion threads
Leiden University
 
Automatische classificatie van teksten
Leiden University
 
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leiden University
 
RemBench: A Digital Workbench for Rembrandt Research
Leiden University
 
Collecting a dataset of information behaviour in context
Leiden University
 
Search engines for the humanities that go beyond Google
Leiden University
 
Krijgen we ooit de beschikking over slimme zoektechnologie?
Leiden University
 
Ad

Recently uploaded (20)

PDF
Asthamudi lake and its fisheries&importance .pdf
J. Bovas Joel BFSc
 
PDF
Integrating Lifestyle Data into Personalized Health Solutions (www.kiu.ac.ug)
publication11
 
PPTX
770043401-q1-Ppt-pe-and-Health-7-week-1-lesson-1.pptx
AizaRazonado
 
PPTX
Phage Therapy and Bacteriophage Biology.pptx
Prachi Virat
 
DOCX
Paper - Suprasegmental Features (Makalah Presentasi)
Sahmiral Amri Rajagukguk
 
PDF
Preserving brand authenticity amid AI-driven misinformation: Sustaining consu...
Selcen Ozturkcan
 
PPTX
Class12_Physics_Chapter2 electric potential and capacitance.pptx
mgmahati1234
 
PDF
Service innovation with AI: Transformation of value proposition and market se...
Selcen Ozturkcan
 
PDF
Pharmakon of algorithmic alchemy: Marketing in the age of AI
Selcen Ozturkcan
 
PDF
oil and gas chemical injection system
Okeke Livinus
 
PDF
Rapid protoplanet formation in the outer Solar System recorded in a dunite fr...
Sérgio Sacani
 
PDF
GUGC Research Overview (December 2024)
Ghent University Global Campus
 
PPTX
Ghent University Global Campus: Overview
Ghent University Global Campus
 
PDF
Calcium in a supernova remnant as a fingerprint of a sub-Chandrasekhar-mass e...
Sérgio Sacani
 
PDF
The ALMA-CRISTAL survey: Gas, dust, and stars in star-forming galaxies when t...
Sérgio Sacani
 
PPTX
ION EXCHANGE CHROMATOGRAPHY NEW PPT (JA).pptx
adhagalejotshna
 
PPTX
Microbiome_Engineering_Poster_Fixed.pptx
SupriyaPolisetty1
 
PPTX
Basal_ganglia_Structure_Function_Importance
muralinath2
 
DOCX
Analytical methods in CleaningValidation.docx
Markus Janssen
 
PPTX
Q1_Science 8_Week3-Day 1.pptx science lesson
AizaRazonado
 
Asthamudi lake and its fisheries&importance .pdf
J. Bovas Joel BFSc
 
Integrating Lifestyle Data into Personalized Health Solutions (www.kiu.ac.ug)
publication11
 
770043401-q1-Ppt-pe-and-Health-7-week-1-lesson-1.pptx
AizaRazonado
 
Phage Therapy and Bacteriophage Biology.pptx
Prachi Virat
 
Paper - Suprasegmental Features (Makalah Presentasi)
Sahmiral Amri Rajagukguk
 
Preserving brand authenticity amid AI-driven misinformation: Sustaining consu...
Selcen Ozturkcan
 
Class12_Physics_Chapter2 electric potential and capacitance.pptx
mgmahati1234
 
Service innovation with AI: Transformation of value proposition and market se...
Selcen Ozturkcan
 
Pharmakon of algorithmic alchemy: Marketing in the age of AI
Selcen Ozturkcan
 
oil and gas chemical injection system
Okeke Livinus
 
Rapid protoplanet formation in the outer Solar System recorded in a dunite fr...
Sérgio Sacani
 
GUGC Research Overview (December 2024)
Ghent University Global Campus
 
Ghent University Global Campus: Overview
Ghent University Global Campus
 
Calcium in a supernova remnant as a fingerprint of a sub-Chandrasekhar-mass e...
Sérgio Sacani
 
The ALMA-CRISTAL survey: Gas, dust, and stars in star-forming galaxies when t...
Sérgio Sacani
 
ION EXCHANGE CHROMATOGRAPHY NEW PPT (JA).pptx
adhagalejotshna
 
Microbiome_Engineering_Poster_Fixed.pptx
SupriyaPolisetty1
 
Basal_ganglia_Structure_Function_Importance
muralinath2
 
Analytical methods in CleaningValidation.docx
Markus Janssen
 
Q1_Science 8_Week3-Day 1.pptx science lesson
AizaRazonado
 

Tutorial on word2vec

  • 1. 2 3 M A R C H 2 0 1 8 , L E I D E N U N I V E RS I T Y WORD2VEC TUTORIAL SUZAN VERBERNE
  • 2. CONTENTS  Introduction  Background  What is word2vec?  What can you do with it?  Practical session Suzan Verberne
  • 4. ABOUT ME  Master (2002): Natural Language Processing (NLP)  PhD (2010): NLP and Information Retrieval (IR)  Now: Assistant Professor at the Leiden Institute for Advanced Computer Science (LIACS), affiliated with the Data Science research programme.  Research: projects on text mining and information retrieval in various domains  Teaching: Data Science & Text Mining Suzan Verberne
  • 6. WHERE TO START  Linguistics: Distributional hypothesis  Data science: Vector Space Model (VSM) Suzan Verberne
  • 7. DISTRIBUTIONAL HYPOTHESIS  Harris, Z. (1954). “Distributional structure”. Word. 10 (23): 146–162.  “Words that occur in similar contexts tend to be similar” Suzan Verberne
  • 8. VECTOR SPACE MODEL  Traditional Vector Space Model (Information Retrieval):  documents and queries represented in a vector space  where the dimensions are the words Suzan Verberne
  • 12. VECTOR SPACE MODEL  Vector Space Models represent (embed) words or documents in a continuous vector space  The vector space is relatively low-dimensional (100 – 320 dimensions instead of 10,000s)  Semantically and syntactically similar words are mapped to nearby points (Distributional Hypothesis) Suzan Verberne
  • 13. Suzan Verberne PCA projection of 320- dimensional vector space
  • 15. WHAT IS WORD2VEC?  Word2vec is a particularly computationally-efficient predictive model for learning word embeddings from raw text  So that similar words are mapped to nearby points Suzan Verberne
  • 16. EXAMPLE: PATIENT FORUM MINER Suzan Verberne
  • 17. WHERE DOES IT COME FROM  Neural network language model (NNLM) (Bengio et al., 2003 – not discussed today)  Mikolov previously proposed (MSc thesis, PhD thesis, other papers) to first learn word vectors “using neural network with a single hidden layer” and then train the NNLM independently  Word2vec directly extends this work => “word vectors learned using a simple model”  Many architectures and models have been proposed for computing word vectors (e.g. see Socher’s Stanford group work which resulted in GloVe - http://nlp.stanford.edu/projects/glove/) Suzan Verberne
  • 18. WHAT IS WORD2VEC (NOT)?  It is:  Neural network-based  Word embeddings  VSM-based  Co-occurrence-based  It is not:  Deep learning Suzan Verberne Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers in a low-dimensional space relative to the vocabulary size
  • 20. HOW IS THE MODEL TRAINED?  Move through the training corpus with a sliding window. Each word is a prediction problem:  the objective is to predict the current word with the help of its contexts (or vice versa)  The outcome of the prediction determines whether we adjust the current word vector. Gradually, vectors converge to (hopefully) optimal values It is important that prediction here is not an aim in itself: it is just a proxy to learn vector representations good for other tasks Suzan Verberne
  • 21. OTHER VECTOR SPACE SEMANTICS TECHNIQUES  Some “classic” NLP for estimating continuous representations of words  – LSA (Latent Semantic Analysis)  – LDA (Latent Dirichlet Allocation)  Distributed representations of words learned by neural networks outperform LSA on various tasks  LDA is computationally expensive and cannot be trained on very large datasets Suzan Verberne
  • 22. ADVANTAGES OF WORD2VEC  It scales  Train on billion word corpora  In limited time  Possibility of parallel training  Pre-trained word embeddings trained by one can be used by others  For entirely different tasks  Incremental training  Train on one piece of data, save results, continue training later on  There is a Python module for it:  Gensim word2vec Suzan Verberne
  • 23. Suzan Verberne WHAT CAN YOU DO WITH IT?
  • 24. DO IT YOURSELF  Implementation in Python package gensim import gensim model = gensim.models.Word2Vec(sentences, size=100, window=5, min_count=5, workers=4) size: the dimensionality of the feature vectors (common: 200 or 320) window: the maximum distance between the current and predicted word within a sentence min_count: minimum number of occurrences of a word in the corpus to be included in the model workers: for parallellization with multicore machines Suzan Verberne
  • 25. DO IT YOURSELF model.most_similar(‘apple’)  [(’banana’, 0.8571481704711914), ...] model.doesnt_match("breakfast cereal dinner lunch".split())  ‘cereal’ model.similarity(‘woman’, ‘man’)  0.73723527 Suzan Verberne
  • 26. WHAT CAN YOU DO WITH IT?  A is to B as C is to ?  This is the famous example: vector(king) – vector(man) + vector(woman) = vector(queen)  Actually, what the original paper says is: if you substract the vector for ‘man’ from the one for ‘king’ and add the vector for ‘woman’, the vector closest to the one you end up with turns out to be the one for ‘queen’.  More interesting: France is to Paris as Germany is to … Suzan Verberne
  • 27. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111–3119, 2013.
  • 28. WHAT CAN YOU DO WITH IT?  A is to B as C is to ?  It also works for syntactic relations:  vector(biggest) - vector(big) + vector(small) = Suzan Verberne vector(smallest)
  • 29. WHAT CAN YOU DO WITH IT?  Mining knowledge about natural language  Improve NLP applications  Improve Text Mining applications Suzan Verberne
  • 30. WHAT CAN YOU DO WITH IT?  Mining knowledge about natural language  Selecting out-of-the-list words  Example: which word does not belong in [monkey, lion, dog, truck]  Selectional preferences  Example: predict typical verb-noun pairs: people as subject of eating is more likely than people as object of eating  Imrove NLP applications:  Sentence completion/text prediction  Bilingual Word Embeddings for Phrase-Based Machine Translation Suzan Verberne
  • 31. WHAT CAN YOU DO WITH IT?  Improve Text Mining applications:  (Near-)Synonym detection ( query expansion)  Concept representation of texts  Example: Twitter sentiment classification  Document similarity  Example: cluster news articles per news event Suzan Verberne
  • 32. DEAL WITH PHRASES  Entities as single words:  during pre-processing: find collocations (e.g. Wikipedia anchor texts), and feed them as single ‘words’ to the neural network.  Bigrams (built-in function) bigram = gensim.models.Phrases(sentence_stream) term_list = list(bigram[sentence_stream]) model = gensim.models.Word2Vec(term_list) Suzan Verberne
  • 33. HOW TO EVALUATE A MODEL  There is a comprehensive test set that contains five types of semantic questions, and nine types of syntactic questions  – 8869 semantic questions – 10675 syntactic questions  https://rare-technologies.com/word2vec-tutorial/ Suzan Verberne
  • 34. FURTHER READING  T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.  T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111– 3119, 2013.  http://radimrehurek.com/2014/02/word2vec-tutorial/  http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram- model/  https://rare-technologies.com/word2vec-tutorial/ Suzan Verberne
  • 36. PRE-TRAINED WORD2VEC MODELS  We will compare two pre-trained word2vec models:  A model trained on a Dutch web corpus (COW)  A model trained on Dutch web forum data  Goal: inspect the differences between the models Suzan Verberne
  • 37. THANKS  Traian Rebedea  Tom Kenter  Mohammad Mahdavi  Satoshi Sekine Suzan Verberne

Editor's Notes

  • #11: One-hot encoding