SlideShare a Scribd company logo
Unsupervised Learning of Sentence Embeddings using
Compositional n-Gram Features
2018/07/12 M1 Hiroki Shimanaka
Matteo Pagliardini, Prakhar Gupta and Martin Jaggi
NAACL 2018
1
lUnsupervised training of word representation, such as Word2Vec
[Mikolov et al., 2013] is now routinely trained on very large amounts of
raw text data, and have become ubiquitous building blocks of a majority
of current state-of-the-art NLP applications.
lWhile very useful semantic representations are available for words.
Abstract & Introduction (1)
2
lA strong trend in deep-learning for NLP leads towards increasingly
powerful and com-plex models.
ØWhile extremely strong in expressiveness, the increased model complexity
makes such models much slower to train on larger datasets.
lOn the other hand, simpler “shallow” models such as matrix
factorizations (or bilinear models) can benefit from training on much
larger sets of data, which can be a key advantage, especially in the
unsupervised setting.
Abstract & Introduction (2)
Abstract & Introduction (3)
lThe authors present a simple but efficient unsupervised objective to
train distributed representations of sentences.
lTheir method outperforms the state-of-the-art unsupervised models
on most benchmark tasks, highlighting the robustness of the produced
general-purpose sentence embeddings.
3
4
Approach
Their proposed model (Sent2Vec) can be seen as an extension of the Cl -
BOW [Mikolov et al., 2013] training objecKve to train sentence instead
of word embeddings.
Sent2Vec is a simple unsupervised model allowing to coml -pose
sentence embeddings using word vectors along with n-gram
embeddings, simultaneously training composiKon and the embed-ding
vectors themselves.
5
Model (1)
lTheir model is inspired by simple matrix factor models (bilinear
models) such as recently very successfully used in unsupervised learning
of word embeddings.
!∈ℝ$×&
: target word vectors
'∈ℝ&×|)|: learnt source word vectors
): vocabulary
ℎ: hidden size
ι4 ∈{0, 1}|)|
:binary vector encoding :
:: sentence
6
Model (2)
lConceptually, Sent2Vec can be interpreted as a natural extension of the
word-contexts from C-BOW [Mikolov et al., 2013] to a larger sentence
context.
!(#): the list of n−grams (including un−igrams)
ι&(') ∈{0, 1}|/|: binary vector encoding !(#) #: sentence
67: source (or context) embed−ding
7
Model (3)
lNegative sampling
!"#
:the set of words sampled nega6vely for the word $% ∈ '
': sentence.": source (or context) embed−ding
/": target embedding
0":the normalized frequency of $ in the corpus
8
Model (4)
lSubsampling
lTo select the possible target unigrams (positives), they use subsampling as in
[Joulin et al., 2017; Bojanowski et al., 2017], each word ! being discarded
with probability 1 − $%(!)
()*
:the set of words sampled negaJvely for the word !+ ∈ -
-: sentence
4): source (or context) embed−ding
5): target embed−ding$6 ! ≔
8)
∑):∈; 8):
8):the normalized frequency of ! in the corpus
9
Experimental Setting (1)
lDataset:
lthe Toronto book corpus (70M sentences)
lWikipedia sentences and tweets
lDropout:
lFor each sentence they use dropout on its list of n-grams ! " ∖ {%(")},
where %(") is the set of all unigrams contained in sentence ".
lThey find that dropping K n-grams (n > 1) for each sentence is giving superior
results compared to dropping each token with some fixed probability.
lRegularization
lApplying L1 regularization to the word vectors.
lOptimizer
lSGD
10
Experimental Setting (2)
11
Evalua&on Tasks
lTransfer tasks:
lSupervised
lMR: movie review senti-ment
lCR: product reviews
lSUBJ: subjectivity classification
lMPQA: opinion polarity
lTREC: question type classifi-cation
lMSRP: paraphrase identification
lUnsupervised
lSICK: semantic relatedness task
lSTS : Semantic textual similarlity
12
Experimental Results (1)
13
Experimental Results (2)
14
Experimental Results (3)
15
Conclusion
lIn this paper, they introduce a novel, computationally efficient, unsupervised,
C-BOW-inspired method to train and infer sentence embeddings.
lOn supervised evaluations, their method, on an average, achieves better
performance than all other unsupervised competitors with the exception of
Skip-Thought [Kiros et al., 2015].
lHowever, their model is generalizable, extremely fast to train, simple to
understand and easily interpretable, showing the relevance of simple and
well-grounded representation models in contrast to the models using deep
architectures.
lFuture work could focus on augmenting the model to exploit data with
ordered sentences.

More Related Content

What's hot (20)

PPTX
What is word2vec?
Traian Rebedea
 
PDF
Probabilistic content models,
Bryan Gummibearehausen
 
PDF
Nlp research presentation
Surya Sg
 
PPTX
Language models
Maryam Khordad
 
PPTX
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
PDF
AINL 2016: Nikolenko
Lidia Pivovarova
 
PPTX
Word2Vec
mohammad javad hasani
 
PDF
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
rudolf eremyan
 
PDF
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
ijtsrd
 
ODP
Tensorflow
Knoldus Inc.
 
PDF
[Emnlp] what is glo ve part i - towards data science
Nikhil Jaiswal
 
PDF
Centroid-based Text Summarization through Compositionality of Word Embeddings
Gaetano Rossiello, PhD
 
PPTX
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
PDF
Basic review on topic modeling
Hiroyuki Kuromiya
 
ODP
Topic Modeling
Karol Grzegorczyk
 
PDF
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
ijaia
 
PDF
Usage of word sense disambiguation in concept identification in ontology cons...
Innovation Quotient Pvt Ltd
 
PDF
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Innovation Quotient Pvt Ltd
 
PDF
text summarization using amr
amit nagarkoti
 
PDF
Word2vec: From intuition to practice using gensim
Edgar Marca
 
What is word2vec?
Traian Rebedea
 
Probabilistic content models,
Bryan Gummibearehausen
 
Nlp research presentation
Surya Sg
 
Language models
Maryam Khordad
 
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
AINL 2016: Nikolenko
Lidia Pivovarova
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
rudolf eremyan
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
ijtsrd
 
Tensorflow
Knoldus Inc.
 
[Emnlp] what is glo ve part i - towards data science
Nikhil Jaiswal
 
Centroid-based Text Summarization through Compositionality of Word Embeddings
Gaetano Rossiello, PhD
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
Basic review on topic modeling
Hiroyuki Kuromiya
 
Topic Modeling
Karol Grzegorczyk
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
ijaia
 
Usage of word sense disambiguation in concept identification in ontology cons...
Innovation Quotient Pvt Ltd
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Innovation Quotient Pvt Ltd
 
text summarization using amr
amit nagarkoti
 
Word2vec: From intuition to practice using gensim
Edgar Marca
 

Similar to [Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features (20)

PPTX
Embedding for fun fumarola Meetup Milano DLI luglio
Deep Learning Italia
 
PPTX
Introduction to Neural Information Retrieval and Large Language Models
sadjadeb
 
PDF
Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Ahmed Hani Ibrahim
 
PPTX
Lecture1.pptx
jonathanG19
 
PDF
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET Journal
 
PDF
Deep learning for nlp
Viet-Trung TRAN
 
PPTX
NLP WORDEMBEDDDING TECHINUES CBOW BOW.pptx
sofia pillai
 
PDF
Language Modelling in Natural Language Processing-Part II.pdf
Deptii Chaudhari
 
PPTX
word vector embeddings in natural languag processing
ReetShinde
 
PPTX
Word_Embeddings.pptx
GowrySailaja
 
PDF
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
PDF
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
PPTX
Word embedding
ShivaniChoudhary74
 
PDF
Representation Learning of Vectors of Words and Phrases
Felipe Moraes
 
PPTX
Word2vec slide(lab seminar)
Jinpyo Lee
 
PPTX
Natural language processing and transformer models
Ding Li
 
PDF
Thai Word Embedding with Tensorflow
Kobkrit Viriyayudhakorn
 
PDF
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
PPTX
wordembedding.pptx
JOBANPREETSINGH62
 
PPTX
Word_Embedding.pptx
NameetDaga1
 
Embedding for fun fumarola Meetup Milano DLI luglio
Deep Learning Italia
 
Introduction to Neural Information Retrieval and Large Language Models
sadjadeb
 
Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Ahmed Hani Ibrahim
 
Lecture1.pptx
jonathanG19
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET Journal
 
Deep learning for nlp
Viet-Trung TRAN
 
NLP WORDEMBEDDDING TECHINUES CBOW BOW.pptx
sofia pillai
 
Language Modelling in Natural Language Processing-Part II.pdf
Deptii Chaudhari
 
word vector embeddings in natural languag processing
ReetShinde
 
Word_Embeddings.pptx
GowrySailaja
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
Word embedding
ShivaniChoudhary74
 
Representation Learning of Vectors of Words and Phrases
Felipe Moraes
 
Word2vec slide(lab seminar)
Jinpyo Lee
 
Natural language processing and transformer models
Ding Li
 
Thai Word Embedding with Tensorflow
Kobkrit Viriyayudhakorn
 
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
wordembedding.pptx
JOBANPREETSINGH62
 
Word_Embedding.pptx
NameetDaga1
 
Ad

More from Hiroki Shimanaka (7)

PDF
[Tutorial] Sentence Representation
Hiroki Shimanaka
 
PDF
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
Hiroki Shimanaka
 
PDF
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
Hiroki Shimanaka
 
PDF
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
Hiroki Shimanaka
 
PDF
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
Hiroki Shimanaka
 
PDF
[論文紹介] Are BLEU and Meaning Representation in Opposition?
Hiroki Shimanaka
 
PPTX
[論文紹介] Skip-Thought Vectors
Hiroki Shimanaka
 
[Tutorial] Sentence Representation
Hiroki Shimanaka
 
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
Hiroki Shimanaka
 
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
Hiroki Shimanaka
 
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
Hiroki Shimanaka
 
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
Hiroki Shimanaka
 
[論文紹介] Are BLEU and Meaning Representation in Opposition?
Hiroki Shimanaka
 
[論文紹介] Skip-Thought Vectors
Hiroki Shimanaka
 
Ad

Recently uploaded (20)

PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PPTX
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PDF
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
PPTX
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PPTX
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PPTX
Thermal runway and thermal stability.pptx
godow93766
 
PDF
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
PPTX
GitOps_Without_K8s_Training simple one without k8s
DanialHabibi2
 
PPTX
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PPTX
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
PPTX
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
Design Thinking basics for Engineers.pdf
CMR University
 
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
MRRS Strength and Durability of Concrete
CivilMythili
 
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
Hashing Introduction , hash functions and techniques
sailajam21
 
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
Thermal runway and thermal stability.pptx
godow93766
 
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
GitOps_Without_K8s_Training simple one without k8s
DanialHabibi2
 
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 

[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

  • 1. Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features 2018/07/12 M1 Hiroki Shimanaka Matteo Pagliardini, Prakhar Gupta and Martin Jaggi NAACL 2018
  • 2. 1 lUnsupervised training of word representation, such as Word2Vec [Mikolov et al., 2013] is now routinely trained on very large amounts of raw text data, and have become ubiquitous building blocks of a majority of current state-of-the-art NLP applications. lWhile very useful semantic representations are available for words. Abstract & Introduction (1)
  • 3. 2 lA strong trend in deep-learning for NLP leads towards increasingly powerful and com-plex models. ØWhile extremely strong in expressiveness, the increased model complexity makes such models much slower to train on larger datasets. lOn the other hand, simpler “shallow” models such as matrix factorizations (or bilinear models) can benefit from training on much larger sets of data, which can be a key advantage, especially in the unsupervised setting. Abstract & Introduction (2)
  • 4. Abstract & Introduction (3) lThe authors present a simple but efficient unsupervised objective to train distributed representations of sentences. lTheir method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings. 3
  • 5. 4 Approach Their proposed model (Sent2Vec) can be seen as an extension of the Cl - BOW [Mikolov et al., 2013] training objecKve to train sentence instead of word embeddings. Sent2Vec is a simple unsupervised model allowing to coml -pose sentence embeddings using word vectors along with n-gram embeddings, simultaneously training composiKon and the embed-ding vectors themselves.
  • 6. 5 Model (1) lTheir model is inspired by simple matrix factor models (bilinear models) such as recently very successfully used in unsupervised learning of word embeddings. !∈ℝ$×& : target word vectors '∈ℝ&×|)|: learnt source word vectors ): vocabulary ℎ: hidden size ι4 ∈{0, 1}|)| :binary vector encoding : :: sentence
  • 7. 6 Model (2) lConceptually, Sent2Vec can be interpreted as a natural extension of the word-contexts from C-BOW [Mikolov et al., 2013] to a larger sentence context. !(#): the list of n−grams (including un−igrams) ι&(') ∈{0, 1}|/|: binary vector encoding !(#) #: sentence 67: source (or context) embed−ding
  • 8. 7 Model (3) lNegative sampling !"# :the set of words sampled nega6vely for the word $% ∈ ' ': sentence.": source (or context) embed−ding /": target embedding 0":the normalized frequency of $ in the corpus
  • 9. 8 Model (4) lSubsampling lTo select the possible target unigrams (positives), they use subsampling as in [Joulin et al., 2017; Bojanowski et al., 2017], each word ! being discarded with probability 1 − $%(!) ()* :the set of words sampled negaJvely for the word !+ ∈ - -: sentence 4): source (or context) embed−ding 5): target embed−ding$6 ! ≔ 8) ∑):∈; 8): 8):the normalized frequency of ! in the corpus
  • 10. 9 Experimental Setting (1) lDataset: lthe Toronto book corpus (70M sentences) lWikipedia sentences and tweets lDropout: lFor each sentence they use dropout on its list of n-grams ! " ∖ {%(")}, where %(") is the set of all unigrams contained in sentence ". lThey find that dropping K n-grams (n > 1) for each sentence is giving superior results compared to dropping each token with some fixed probability. lRegularization lApplying L1 regularization to the word vectors. lOptimizer lSGD
  • 12. 11 Evalua&on Tasks lTransfer tasks: lSupervised lMR: movie review senti-ment lCR: product reviews lSUBJ: subjectivity classification lMPQA: opinion polarity lTREC: question type classifi-cation lMSRP: paraphrase identification lUnsupervised lSICK: semantic relatedness task lSTS : Semantic textual similarlity
  • 16. 15 Conclusion lIn this paper, they introduce a novel, computationally efficient, unsupervised, C-BOW-inspired method to train and infer sentence embeddings. lOn supervised evaluations, their method, on an average, achieves better performance than all other unsupervised competitors with the exception of Skip-Thought [Kiros et al., 2015]. lHowever, their model is generalizable, extremely fast to train, simple to understand and easily interpretable, showing the relevance of simple and well-grounded representation models in contrast to the models using deep architectures. lFuture work could focus on augmenting the model to exploit data with ordered sentences.