SlideShare a Scribd company logo
Supervised Learning of Universal Sentence Representations
from Natural Language Inference Data
2017/11/7 B4 Hiroki Shimanaka
Alexis Conneau, Douwe Kiela, Holger Schwenk, Lo ̈ıc Barrault, Antoine Bordes
EMNLP 2017
Abstract & Introduction (1)
Many modern NLP systems rely on word embeddings, previously
trained in an un-supervised manner on large corpora, as base features.
Word embeddings
 Word2Vec (Mikolov et al., 2013)
 GloVe (Pennington et al., 2014)
Efforts to obtain embeddings for larger chunks of text, such as
sentences, have however not been so successful.
Unsupervised Sentence embeddings
 SkipThought (Kiros et al., 2015)
 FastSent (Hill et al., 2016)
1
2
In this paper, they show how universal sentence representations
trained using the supervised data of the Stanford Natural Language
Inference datasets.
It can consistently outperform unsu-pervised methods like SkipThought
vec-tors (Kiros et al., 2015) on a wide range of transfer tasks.
Abstract & Introduction (2)
3
Approach
This work combines two research directions, which they describe in
what follows.
First, they ex-plain how the NLI task can be used to train univer-sal sentence
encoding models using the SNLI task.
Second, they describe the architectures that we investigated for the sentence
encoder, which, in their opinion, covers a suitable range of sentence
encoders currently in use.
4
SNLI (Stanford Natural Language Inference) dataset
The SNLI dataset consists of 570k human-generated English sentence
pairs, manually la-beled with one of three categories: entailment,
contradiction and neutral.
https://nlp.stanford.edu/projects/snli/
5
The Natural Language Inference task
Once the sentence vectors are generated, 3
matching methods are applied to extract
relations between 𝑢 and 𝑣.
(i) concatenation of the two representa-tions (u, v)
(ii) element-wise product u ∗ v
(iii) absolute element-wise difference |u − v|
The resulting vector, which captures
information from both the premise and the
hypothesis, is fed into a 3-class classifier
consisting of multiple fully-connected layers
culminating in a softmax layer.
6
Sentence encoder architectures (1)
LSTM (Hochreiter and Schmidhuber, 1997) and GRU (Cho et al., 2014)
A sentence is represented by the last hid-den vector.
BiGRU-last
It con-catenates the last hidden state of a forward GRU, and the last hidden
state of a backward GRU.
7
Sentence encoder architectures (2)
BiLSTM with mean/max pooling
8
Sentence encoder architectures (3)
It uses an attention mecha-nism
over the hidden states of a
BiLSTM to gen-erate a
representation 𝑢 of an input
sentence.
 self-attentive sentence encoder (Liu et al., 2016; Lin et al., 2017)
9
Sentence encoder architectures (4)
Hierarchical ConvNet
It is like AdaSent (Zhao et al., 2015).
The final representation 𝑢 =
[𝑢1, 𝑢2, 𝑢3, 𝑢4] concatenates
representations at different levels of
the input sentence.
10
Training details
For all their models trained on SNLI, they use SGD with a learning rate
of 0.1 and a weight decay of 0.99.
At each epoch, they divide the learning rate by 5 if the dev accuracy
decreases.
they use mini-batches of size 64 and training is stopped when the
learning rate goes under the threshold of 10−5.
For the classifier, they use a multi-layer perceptron with 1 hidden-layer
of 512 hidden units.
They use open-source GloVe vectors trained on Common Crawl 840B
with 300 dimensions as fixed word embeddings.
11
Evaluation of sentence representations (1)
Binary and multi-class classification
sentiment analysis (MR, SST)
question-type (TREC)
product reviews (CR)
sub-jectivity/objectivity (SUBJ)
opinion polarity (MPQA)
Entailment and semantic relatedness
They also evaluate on the SICK dataset for both entailment (SICK-E) and
semantic relatedness (SICK-R).
Semantic Textual Similarity (STS14 (Agirre et al., 2014)).
12
Evaluation of sentence representations (2)
Paraphrase detection
Sentence pairs have been human-annotated according to whether they cap-
ture a paraphrase/semantic equivalence relation-ship.
Caption-Image retrieval
The caption-image retrieval task evaluates joint image and language feature
models (Hodosh et al., 2013; Lin et al., 2014).
The goal is either to rank a large collec-tion of images by their relevance with
respect to a given query caption (Image Retrieval), or ranking captions by
their relevance for a given query image (Caption Retrieval).
13
Result (1)
14
Result (2)
15
Result (3)
16
Conclusion
This paper studies the effects of training sentence embeddings with
supervised data by testing on 12 different transfer tasks.
They showed that mod-els learned on NLI can perform better than
mod-els trained in unsupervised conditions or on other supervised tasks.
They showed that a BiLSTM network with max pooling makes the best
current universal sen-tence encoding methods, outperforming existing
approaches like SkipThought vectors.
17
Reference
Supervised Learning of Universal Sentence Representations from
Natural Language Inference Data, Conneau et al., EMNLP 2017

More Related Content

Similar to [Paper Reading] Supervised Learning of Universal Sentence Representations from Natural Language Inference Data (20)

PDF
Continuous bag of words cbow word2vec word embedding work .pdf
devangmittal4
 
PDF
Effect of word embedding vector dimensionality on sentiment analysis through ...
IAESIJAI
 
PDF
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
ijseajournal
 
PDF
A semantic framework and software design to enable the transparent integratio...
Patricia Tavares Boralli
 
PDF
UWB semeval2016-task5
Lukáš Svoboda
 
PDF
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
ijaia
 
PDF
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
PDF
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
PDF
1808.10245v1 (1).pdf
KSHITIJCHAUDHARY20
 
PDF
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
csandit
 
PDF
French machine reading for question answering
Ali Kabbadj
 
PPTX
Deep Neural Methods for Retrieval
Bhaskar Mitra
 
PDF
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
IOSR Journals
 
PDF
L017158389
IOSR Journals
 
PDF
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
Journal For Research
 
PDF
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
IJECEIAES
 
PDF
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Vijay Prakash Dwivedi
 
PDF
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
Hiroki Shimanaka
 
PDF
Natural Language Processing Through Different Classes of Machine Learning
csandit
 
PDF
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Continuous bag of words cbow word2vec word embedding work .pdf
devangmittal4
 
Effect of word embedding vector dimensionality on sentiment analysis through ...
IAESIJAI
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
ijseajournal
 
A semantic framework and software design to enable the transparent integratio...
Patricia Tavares Boralli
 
UWB semeval2016-task5
Lukáš Svoboda
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
ijaia
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
1808.10245v1 (1).pdf
KSHITIJCHAUDHARY20
 
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
csandit
 
French machine reading for question answering
Ali Kabbadj
 
Deep Neural Methods for Retrieval
Bhaskar Mitra
 
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
IOSR Journals
 
L017158389
IOSR Journals
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
Journal For Research
 
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
IJECEIAES
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Vijay Prakash Dwivedi
 
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
Hiroki Shimanaka
 
Natural Language Processing Through Different Classes of Machine Learning
csandit
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 

More from Hiroki Shimanaka (7)

PDF
[Tutorial] Sentence Representation
Hiroki Shimanaka
 
PDF
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
Hiroki Shimanaka
 
PDF
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
Hiroki Shimanaka
 
PDF
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
Hiroki Shimanaka
 
PDF
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
Hiroki Shimanaka
 
PDF
[論文紹介] Are BLEU and Meaning Representation in Opposition?
Hiroki Shimanaka
 
PPTX
[論文紹介] Skip-Thought Vectors
Hiroki Shimanaka
 
[Tutorial] Sentence Representation
Hiroki Shimanaka
 
[論文紹介] Reference Bias in Monolingual Machine Translation Evaluation
Hiroki Shimanaka
 
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
Hiroki Shimanaka
 
[論文紹介] PARANMT-50M- Pushing the Limits of Paraphrastic Sentence Embeddings wi...
Hiroki Shimanaka
 
[論文紹介] AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS.
Hiroki Shimanaka
 
[論文紹介] Are BLEU and Meaning Representation in Opposition?
Hiroki Shimanaka
 
[論文紹介] Skip-Thought Vectors
Hiroki Shimanaka
 
Ad

Recently uploaded (20)

PPTX
Phage Therapy and Bacteriophage Biology.pptx
Prachi Virat
 
PDF
Treatment and safety of drinking water .
psuvethapalani
 
PDF
Annual report 2024 - Inria - English version.pdf
Inria
 
PPTX
abdominal compartment syndrome presentation and treatment.pptx
LakshmiMounicaGrandh
 
PPTX
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
PPTX
LESSON 2 PSYCHOSOCIAL DEVELOPMENT.pptx L
JeanCarolColico1
 
PPTX
PEDIA IDS IN A GIST_6488b6b5-3152-4a4a-a943-20a56efddd43 (2).pptx
tdas83504
 
PPTX
Entner-Doudoroff pathway by Santosh .pptx
santoshpaudel35
 
PPTX
Q1_Science 8_Week3-Day 1.pptx science lesson
AizaRazonado
 
PDF
Pharmakon of algorithmic alchemy: Marketing in the age of AI
Selcen Ozturkcan
 
PDF
The ALMA-CRISTAL survey: Gas, dust, and stars in star-forming galaxies when t...
Sérgio Sacani
 
PDF
Chemokines and Receptors Overview – Key to Immune Cell Signaling
Benjamin Lewis Lewis
 
PPTX
MODULE 2 Effects of Lifestyle in the Function of Respiratory and Circulator...
judithgracemangunday
 
PDF
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
PDF
Unit-3 ppt.pdf organic chemistry - 3 unit 3
visionshukla007
 
PPTX
Microbiome_Engineering_Poster_Fixed.pptx
SupriyaPolisetty1
 
PDF
Unit-3 ppt.pdf organic chemistry unit 3 heterocyclic
visionshukla007
 
PDF
High-speedBouldersandtheDebrisFieldinDARTEjecta
Sérgio Sacani
 
PDF
FYS 100 final presentation on Afro cubans
RowanSales
 
PDF
soil and environmental microbiology.pdf
Divyaprabha67
 
Phage Therapy and Bacteriophage Biology.pptx
Prachi Virat
 
Treatment and safety of drinking water .
psuvethapalani
 
Annual report 2024 - Inria - English version.pdf
Inria
 
abdominal compartment syndrome presentation and treatment.pptx
LakshmiMounicaGrandh
 
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
LESSON 2 PSYCHOSOCIAL DEVELOPMENT.pptx L
JeanCarolColico1
 
PEDIA IDS IN A GIST_6488b6b5-3152-4a4a-a943-20a56efddd43 (2).pptx
tdas83504
 
Entner-Doudoroff pathway by Santosh .pptx
santoshpaudel35
 
Q1_Science 8_Week3-Day 1.pptx science lesson
AizaRazonado
 
Pharmakon of algorithmic alchemy: Marketing in the age of AI
Selcen Ozturkcan
 
The ALMA-CRISTAL survey: Gas, dust, and stars in star-forming galaxies when t...
Sérgio Sacani
 
Chemokines and Receptors Overview – Key to Immune Cell Signaling
Benjamin Lewis Lewis
 
MODULE 2 Effects of Lifestyle in the Function of Respiratory and Circulator...
judithgracemangunday
 
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
Unit-3 ppt.pdf organic chemistry - 3 unit 3
visionshukla007
 
Microbiome_Engineering_Poster_Fixed.pptx
SupriyaPolisetty1
 
Unit-3 ppt.pdf organic chemistry unit 3 heterocyclic
visionshukla007
 
High-speedBouldersandtheDebrisFieldinDARTEjecta
Sérgio Sacani
 
FYS 100 final presentation on Afro cubans
RowanSales
 
soil and environmental microbiology.pdf
Divyaprabha67
 
Ad

[Paper Reading] Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

  • 1. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data 2017/11/7 B4 Hiroki Shimanaka Alexis Conneau, Douwe Kiela, Holger Schwenk, Lo ̈ıc Barrault, Antoine Bordes EMNLP 2017
  • 2. Abstract & Introduction (1) Many modern NLP systems rely on word embeddings, previously trained in an un-supervised manner on large corpora, as base features. Word embeddings  Word2Vec (Mikolov et al., 2013)  GloVe (Pennington et al., 2014) Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so successful. Unsupervised Sentence embeddings  SkipThought (Kiros et al., 2015)  FastSent (Hill et al., 2016) 1
  • 3. 2 In this paper, they show how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets. It can consistently outperform unsu-pervised methods like SkipThought vec-tors (Kiros et al., 2015) on a wide range of transfer tasks. Abstract & Introduction (2)
  • 4. 3 Approach This work combines two research directions, which they describe in what follows. First, they ex-plain how the NLI task can be used to train univer-sal sentence encoding models using the SNLI task. Second, they describe the architectures that we investigated for the sentence encoder, which, in their opinion, covers a suitable range of sentence encoders currently in use.
  • 5. 4 SNLI (Stanford Natural Language Inference) dataset The SNLI dataset consists of 570k human-generated English sentence pairs, manually la-beled with one of three categories: entailment, contradiction and neutral. https://nlp.stanford.edu/projects/snli/
  • 6. 5 The Natural Language Inference task Once the sentence vectors are generated, 3 matching methods are applied to extract relations between 𝑢 and 𝑣. (i) concatenation of the two representa-tions (u, v) (ii) element-wise product u ∗ v (iii) absolute element-wise difference |u − v| The resulting vector, which captures information from both the premise and the hypothesis, is fed into a 3-class classifier consisting of multiple fully-connected layers culminating in a softmax layer.
  • 7. 6 Sentence encoder architectures (1) LSTM (Hochreiter and Schmidhuber, 1997) and GRU (Cho et al., 2014) A sentence is represented by the last hid-den vector. BiGRU-last It con-catenates the last hidden state of a forward GRU, and the last hidden state of a backward GRU.
  • 8. 7 Sentence encoder architectures (2) BiLSTM with mean/max pooling
  • 9. 8 Sentence encoder architectures (3) It uses an attention mecha-nism over the hidden states of a BiLSTM to gen-erate a representation 𝑢 of an input sentence.  self-attentive sentence encoder (Liu et al., 2016; Lin et al., 2017)
  • 10. 9 Sentence encoder architectures (4) Hierarchical ConvNet It is like AdaSent (Zhao et al., 2015). The final representation 𝑢 = [𝑢1, 𝑢2, 𝑢3, 𝑢4] concatenates representations at different levels of the input sentence.
  • 11. 10 Training details For all their models trained on SNLI, they use SGD with a learning rate of 0.1 and a weight decay of 0.99. At each epoch, they divide the learning rate by 5 if the dev accuracy decreases. they use mini-batches of size 64 and training is stopped when the learning rate goes under the threshold of 10−5. For the classifier, they use a multi-layer perceptron with 1 hidden-layer of 512 hidden units. They use open-source GloVe vectors trained on Common Crawl 840B with 300 dimensions as fixed word embeddings.
  • 12. 11 Evaluation of sentence representations (1) Binary and multi-class classification sentiment analysis (MR, SST) question-type (TREC) product reviews (CR) sub-jectivity/objectivity (SUBJ) opinion polarity (MPQA) Entailment and semantic relatedness They also evaluate on the SICK dataset for both entailment (SICK-E) and semantic relatedness (SICK-R). Semantic Textual Similarity (STS14 (Agirre et al., 2014)).
  • 13. 12 Evaluation of sentence representations (2) Paraphrase detection Sentence pairs have been human-annotated according to whether they cap- ture a paraphrase/semantic equivalence relation-ship. Caption-Image retrieval The caption-image retrieval task evaluates joint image and language feature models (Hodosh et al., 2013; Lin et al., 2014). The goal is either to rank a large collec-tion of images by their relevance with respect to a given query caption (Image Retrieval), or ranking captions by their relevance for a given query image (Caption Retrieval).
  • 17. 16 Conclusion This paper studies the effects of training sentence embeddings with supervised data by testing on 12 different transfer tasks. They showed that mod-els learned on NLI can perform better than mod-els trained in unsupervised conditions or on other supervised tasks. They showed that a BiLSTM network with max pooling makes the best current universal sen-tence encoding methods, outperforming existing approaches like SkipThought vectors.
  • 18. 17 Reference Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, Conneau et al., EMNLP 2017

Editor's Notes

  • #3: Word embedding が様々なNLPタスクで役に立っている。Word embeddings では文のベクトルを生成しにくい。他にも文ベクトルを生成するツールが存在するが、それらもあまりいい成果を挙げれていない。
  • #4: そこでこの論文では SNLI の教師あり datasets を用いてどのように文ベクトルを学習するかを示す。 またこの方法で様々なタスクでSkipThoughtのような教師なし学習の方法より良い結果を出せることができる。
  • #5: この方法は以下の2つの研究方向を組み合わせたもの出す。 一つ目の方向性として、SNLI datasets を用いた NLIタスクをどのようにしてユニバーサルセンテンスエンコーディングに用いるかというもの 二つ目の方向性として、どのセンテンスエンコーダがこのタスクに向いているかというもの
  • #6: Flickr30kのキャプションをもとに含意、矛盾、中立する文をクラウドソーシングで収集したもの
  • #7: 512次元の隠れ層とsoftmax層により3値分類する
  • #13: 分類問題 意味的関係
  • #14: 言い換えの認識 キャプション、画像検索
  • #16: 今まで最も優れていた文エンコーダであるSkipthought-LN は64M, 1ヶ月 提案手法は 57k, 1日
  • #18: 本稿では、12の異なる転送タスクをテストすることにより、教師付きデータを含む訓練センテンス埋め込みの効果を研究する。 彼らは、NLIで学んだモデルは、監督されていない状態や他の管理対象タスクで訓練されたモデルよりも優れたパフォーマンスを発揮できることを示しました。 彼らは、最大プールを持つBiLSTMネットワークがSkipThoughtベクトルのような既存のアプローチより優れた現在の普遍的なセンシングエンコーディングメソッドを作成することを示しました。
  • #19: 本稿では、12の異なる転送タスクをテストすることにより、教師付きデータを含む訓練センテンス埋め込みの効果を研究する。 彼らは、NLIで学んだモデルは、監督されていない状態や他の管理対象タスクで訓練されたモデルよりも優れたパフォーマンスを発揮できることを示しました。 彼らは、最大プールを持つBiLSTMネットワークがSkipThoughtベクトルのような既存のアプローチより優れた現在の普遍的なセンシングエンコーディングメソッドを作成することを示しました。