SlideShare a Scribd company logo
Seq2seq
...and beyond
Hello!I am Roberto Silveira
EE engineer, ML enthusiast
rsilveira79@gmail.com
@rsilveira79
Sequence
Is a matter of time
RNN
Is what you need!
Basic Recurrent cells (RNN)
Source: http://colah.github.io/
Issues
× Difficulties to deal with long term
dependencies
× Difficult to train - vanish gradient issues
Long term issues
Source: http://colah.github.io/,
CS224d notes
Sentence 1
"Jane walked into the room. John walked
in too. Jane said hi to ___"
Sentence 2
"Jane walked into the room. John walked in
too. It was late in the day, and everyone was
walking home after a long day at work. Jane
said hi to ___"
LSTM in 2 min...
Review
× Address long term dependencies
× More complex to train
× Very powerful, lots of data
Source: http://colah.github.io/
LSTM in 2 min...
Review
× Address long term dependencies
× More complex to train
× Very powerful, lots of data
Cell state
Source: http://colah.github.io/
Forget
gate
Input
gate
Output
gate
Gated recurrent unit (GRU) in 2 min ...
Review
× Fewer hyperparameters
× Train faster
× Better solution w/ less data
Source: http://www.wildml.com/,
arXiv:1412.3555
Gated recurrent unit (GRU) in 2 min ...
Review
× Fewer hyperparameters
× Train faster
× Better solution w/ less data
Source: http://www.wildml.com/,
arXiv:1412.3555
Reset
gate
Update
gate
Seq2seq
learning
Or encoder-decoder
architectures
Variable size input - output
Source: http://karpathy.github.io/
Variable size input - output
Source: http://karpathy.github.io/
Basic idea
"Variable" size input (encoder) ->
Fixed size vector representation ->
"Variable" size output (decoder)
"Machine",
"Learning",
"is",
"fun"
"Aprendizado",
"de",
"Máquina",
"é",
"divertido"
0.636
0.122
0.981
Input
One word at a time Stateful
Model
Stateful
Model
Encoded
Sequence
Output
One word at a time
First RNN
(Encoder)
Second
RNN
(Decoder)
Memory of previous
word influence next
result
Memory of previous
word influence next
result
Sequence to Sequence Learning with NeuralNetworks (2014)
"Machine",
"Learning",
"is",
"fun"
"Aprendizado",
"de",
"Máquina",
"é",
"divertido"
0.636
0.122
0.981
1000d word
embeddings
4 layers
1000
cells/layer
Encoded
Sequence
LSTM
(Encoder)
LSTM
(Decoder)
Source: arXiv 1409.3215v3
TRAINING → SGD w/o momentum, fixed learning rate of 0.7, 7.5 epochs, batches of 128
sentences, 10 days of training (WMT 14 dataset English to French)
4 layers
1000
cells/layer
Recurrent encoder-decoders
Les chiens aiment les os <EOS> Dogs love bones
Dogs love bones <EOS>
Source Sequence Target Sequence
Source: arXiv 1409.3215v3
Recurrent encoder-decoders
Les chiens aiment les os <EOS> Dogs love bones
Dogs love bones <EOS>
Source: arXiv 1409.3215v3
Recurrent encoder-decoders
Leschiensaimentlesos <EOS> Dogs love bones
Dogs love bones <EOS>
Source: arXiv 1409.3215v3
Source: arXiv 1409.3215v3
Recurrent encoder-decoders - issues
● Difficult to cope with large sentences (longer than training corpus)
● Decoder w/ attention mechanism →relieve encoder to squash into
fixed length vector
NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN ANDTRANSLATE (2015)
Source: arXiv 1409.0473v7
Decoder
Context vector for
each target word
Weights of each
annotation hj
NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN ANDTRANSLATE (2015)
Source: arXiv 1409.0473v7
Decoder
Context vector for
each target word
Weights of each
annotation hj
Non-monotonic
alignment
Attention models for NLP
Source: arXiv 1409.0473v7
Les chiens aiment les os <EOS>
+
<EOS>
Attention models for NLP
Source: arXiv 1409.0473v7
Les chiens aiment les os <EOS>
+
<EOS>
Dogs
Attention models for NLP
Source: arXiv 1409.0473v7
Les chiens aiment les os <EOS>
+
<EOS>
Dogs
Dogs
love
+
Attention models for NLP
Source: arXiv 1409.0473v7
Les chiens aiment les os <EOS>
+
<EOS>
Dogs
Dogs
love
+
love
bones+
Challenges in using the model
● Cannot handle true
variable size input
Source: http://suriyadeepan.github.io/
PADDING
BUCKETING
WORD
EMBEDDINGS
● Capture context
semantic meaning
● Hard to deal with both
short and large sentences
padding
Source: http://suriyadeepan.github.io/
EOS : End of sentence
PAD : Filler
GO : Start decoding
UNK : Unknown; word not in vocabulary
Q : "What time is it? "
A : "It is seven thirty."
Q : [ PAD, PAD, PAD, PAD, PAD, “?”, “it”,“is”, “time”, “What” ]
A : [ GO, “It”, “is”, “seven”, “thirty”, “.”, EOS, PAD, PAD, PAD ]
Source: https://www.tensorflow.org/
bucketing
Efficiently handle sentences of different lengths
Ex: 100 tokens is the largest sentence in corpus
How about short sentences like: "How are you?" → lots of PAD
Bucket list: [(5, 10), (10, 15), (20, 25), (40, 50)]
(defaut on Tensorflow translate.py)
Q : [ PAD, PAD, “.”, “go”,“I”]
A : [GO "Je" "vais" "." EOS PAD PAD PAD PAD PAD]
Word embeddings (remember previous presentation ;-)
Distributed representations → syntactic and semantic is captured
Take =
0.286
0.792
-0.177
-0.107
0.109
-0.542
0.349
0.271
Word embeddings (remember previous presentation ;-)
Linguistic regularities (recap)
Phrase representations(Paper - earning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation)
Source: arXiv 1406.1078v3
Phrase representations(Paper - earning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation)
Source: arXiv 1406.1078v3
1000d vector representation
applications
Neural conversational model - chatbots
Source: arXiv 1506.05869v3
Google Smart reply
Google Smart reply
Source: arXiv 1606.04870v1
Interesting facts
● Currently responsible for 10% Inbox replies
● Training set 238 million messages
Google Smart reply
Source: arXiv 1606.04870v1
Seq2Seq
model
Interesting facts
● Currently responsible for 10% Inbox replies
● Training set 238 million messages
Feedforward
triggering
model
Semi-supervised
semantic clustering
Image captioning(Paper - Show and Tell: A Neural Image Caption Generator)
Source: arXiv 1411.4555v2
Image captioning(Paper - Show and Tell: A Neural Image Caption Generator)
Encoder
Decoder
Source: arXiv 1411.4555v2
What's next?
And so?
Multi-task sequence to sequence(Paper - MULTI-TASK SEQUENCE TO SEQUENCE LEARNING)
Source: arXiv 1511.06114v4
One-to-Many
(common encoder)
Many-to-One
(common decoder)
Many-to-Many
Neural programmer(Paper - NEURAL PROGRAMMER: INDUCING LATENT PROGRAMS WITH GRADIENT DESCENT)
Source: arXiv 1511.04834v3
Unsupervised pre-training for seq2seq - 2017(Paper - UNSUPERVISED PRETRAINING FOR SEQUENCE TO SEQUENCE LEARNING)
Source: arXiv 1611.02683v1
Unsupervised pre-training for seq2seq - 2017(Paper - UNSUPERVISED PRETRAINING FOR SEQUENCE TO SEQUENCE LEARNING)
Source: arXiv 1611.02683v1
Pre-trained
Pre-trained
THANKS!
rsilveira79@gmail.com
@rsilveira79
Place your screenshot here
A Quick
example on
tensorflow

More Related Content

What's hot (20)

PDF
Deep Learning for Natural Language Processing: Word Embeddings
Roelof Pieters
 
PPT
Dynamic programming
Gopi Saiteja
 
PDF
RNN and its applications
Sungjoon Choi
 
PPTX
Presentation on Text Classification
Sai Srinivas Kotni
 
PDF
Lecture 4: Deep Learning Frameworks
Mohamed Loey
 
PPTX
Natural language processing and transformer models
Ding Li
 
PPTX
Lstm
Mehrnaz Faraz
 
PDF
Seq2Seq (encoder decoder) model
佳蓉 倪
 
PPTX
Local search algorithm
Megha Sharma
 
PPTX
Optimization problems and algorithms
Aboul Ella Hassanien
 
PDF
Introduction to Natural Language Processing (NLP)
VenkateshMurugadas
 
PDF
Long Short Term Memory
Yan Xu
 
PPTX
Transformers AI PPT.pptx
RahulKumar854607
 
PDF
LSTM Tutorial
Ralph Schlosser
 
PDF
Introduction to Recurrent Neural Network
Knoldus Inc.
 
PDF
Natural Language Processing with Python
Benjamin Bengfort
 
PPTX
Regular expressions
Ratnakar Mikkili
 
PPTX
Attention Is All You Need
Illia Polosukhin
 
PDF
Matrix Factorization In Recommender Systems
YONG ZHENG
 
Deep Learning for Natural Language Processing: Word Embeddings
Roelof Pieters
 
Dynamic programming
Gopi Saiteja
 
RNN and its applications
Sungjoon Choi
 
Presentation on Text Classification
Sai Srinivas Kotni
 
Lecture 4: Deep Learning Frameworks
Mohamed Loey
 
Natural language processing and transformer models
Ding Li
 
Seq2Seq (encoder decoder) model
佳蓉 倪
 
Local search algorithm
Megha Sharma
 
Optimization problems and algorithms
Aboul Ella Hassanien
 
Introduction to Natural Language Processing (NLP)
VenkateshMurugadas
 
Long Short Term Memory
Yan Xu
 
Transformers AI PPT.pptx
RahulKumar854607
 
LSTM Tutorial
Ralph Schlosser
 
Introduction to Recurrent Neural Network
Knoldus Inc.
 
Natural Language Processing with Python
Benjamin Bengfort
 
Regular expressions
Ratnakar Mikkili
 
Attention Is All You Need
Illia Polosukhin
 
Matrix Factorization In Recommender Systems
YONG ZHENG
 

Viewers also liked (8)

PDF
Word Embeddings - Introduction
Christian Perone
 
PPTX
Companies
Quoc Le
 
PPT
RF Encoder / Decoder Chipset
Premier Farnell
 
PPTX
5g wireless technology
Sudhanshu Jha
 
PPTX
Marvel Case Presentation
Chetan Dua
 
PPTX
Network Architecture of 5G Mobile Tecnology
vineetkathan
 
PPTX
5G Wireless Technology
Niki Upadhyay
 
PPT
Presentation on 1G/2G/3G/4G/5G/Cellular & Wireless Technologies
Kaushal Kaith
 
Word Embeddings - Introduction
Christian Perone
 
Companies
Quoc Le
 
RF Encoder / Decoder Chipset
Premier Farnell
 
5g wireless technology
Sudhanshu Jha
 
Marvel Case Presentation
Chetan Dua
 
Network Architecture of 5G Mobile Tecnology
vineetkathan
 
5G Wireless Technology
Niki Upadhyay
 
Presentation on 1G/2G/3G/4G/5G/Cellular & Wireless Technologies
Kaushal Kaith
 
Ad

Similar to Sequence to sequence (encoder-decoder) learning (20)

PDF
Generating Natural-Language Text with Neural Networks
Jonathan Mugan
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
NAVER D2
 
PDF
Natural Language Processing
Sandeep Malhotra
 
PDF
Tensor flow05 neural-machine-translation-seq2seq
Sam Witteveen
 
PPTX
Study of End to End memory networks
ASHISH MENKUDALE
 
PDF
AINL 2016: Nikolenko
Lidia Pivovarova
 
PDF
Building a Neural Machine Translation System From Scratch
Natasha Latysheva
 
PDF
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
PDF
Denis Yarats ITEM 2018
ITEM
 
PDF
Deep learning for nlp
Viet-Trung TRAN
 
PDF
M5 Topic 1 - Encoder Decoder MODEL-JEC.pdf
KeshavSen4
 
PPTX
Chatbot ppt
Manish Mishra
 
PDF
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
PDF
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
Plain Concepts
 
PDF
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
IRJET Journal
 
PPTX
Deep Learning for Natural Language Processing
Jonathan Mugan
 
PPTX
Introduction to Neural Information Retrieval and Large Language Models
sadjadeb
 
PDF
Deep learning for NLP
Sihem Romdhani
 
PDF
05-transformers.pdf
ChaoYang81
 
Generating Natural-Language Text with Neural Networks
Jonathan Mugan
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD
NAVER D2
 
Natural Language Processing
Sandeep Malhotra
 
Tensor flow05 neural-machine-translation-seq2seq
Sam Witteveen
 
Study of End to End memory networks
ASHISH MENKUDALE
 
AINL 2016: Nikolenko
Lidia Pivovarova
 
Building a Neural Machine Translation System From Scratch
Natasha Latysheva
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Denis Yarats ITEM 2018
ITEM
 
Deep learning for nlp
Viet-Trung TRAN
 
M5 Topic 1 - Encoder Decoder MODEL-JEC.pdf
KeshavSen4
 
Chatbot ppt
Manish Mishra
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
Plain Concepts
 
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
IRJET Journal
 
Deep Learning for Natural Language Processing
Jonathan Mugan
 
Introduction to Neural Information Retrieval and Large Language Models
sadjadeb
 
Deep learning for NLP
Sihem Romdhani
 
05-transformers.pdf
ChaoYang81
 
Ad

Recently uploaded (20)

PDF
July Patch Tuesday
Ivanti
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
PDF
Français Patch Tuesday - Juillet
Ivanti
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
July Patch Tuesday
Ivanti
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
Français Patch Tuesday - Juillet
Ivanti
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 

Sequence to sequence (encoder-decoder) learning

  • 2. Hello!I am Roberto Silveira EE engineer, ML enthusiast [email protected] @rsilveira79
  • 5. Basic Recurrent cells (RNN) Source: http://colah.github.io/ Issues × Difficulties to deal with long term dependencies × Difficult to train - vanish gradient issues
  • 6. Long term issues Source: http://colah.github.io/, CS224d notes Sentence 1 "Jane walked into the room. John walked in too. Jane said hi to ___" Sentence 2 "Jane walked into the room. John walked in too. It was late in the day, and everyone was walking home after a long day at work. Jane said hi to ___"
  • 7. LSTM in 2 min... Review × Address long term dependencies × More complex to train × Very powerful, lots of data Source: http://colah.github.io/
  • 8. LSTM in 2 min... Review × Address long term dependencies × More complex to train × Very powerful, lots of data Cell state Source: http://colah.github.io/ Forget gate Input gate Output gate
  • 9. Gated recurrent unit (GRU) in 2 min ... Review × Fewer hyperparameters × Train faster × Better solution w/ less data Source: http://www.wildml.com/, arXiv:1412.3555
  • 10. Gated recurrent unit (GRU) in 2 min ... Review × Fewer hyperparameters × Train faster × Better solution w/ less data Source: http://www.wildml.com/, arXiv:1412.3555 Reset gate Update gate
  • 12. Variable size input - output Source: http://karpathy.github.io/
  • 13. Variable size input - output Source: http://karpathy.github.io/
  • 14. Basic idea "Variable" size input (encoder) -> Fixed size vector representation -> "Variable" size output (decoder) "Machine", "Learning", "is", "fun" "Aprendizado", "de", "Máquina", "é", "divertido" 0.636 0.122 0.981 Input One word at a time Stateful Model Stateful Model Encoded Sequence Output One word at a time First RNN (Encoder) Second RNN (Decoder) Memory of previous word influence next result Memory of previous word influence next result
  • 15. Sequence to Sequence Learning with NeuralNetworks (2014) "Machine", "Learning", "is", "fun" "Aprendizado", "de", "Máquina", "é", "divertido" 0.636 0.122 0.981 1000d word embeddings 4 layers 1000 cells/layer Encoded Sequence LSTM (Encoder) LSTM (Decoder) Source: arXiv 1409.3215v3 TRAINING → SGD w/o momentum, fixed learning rate of 0.7, 7.5 epochs, batches of 128 sentences, 10 days of training (WMT 14 dataset English to French) 4 layers 1000 cells/layer
  • 16. Recurrent encoder-decoders Les chiens aiment les os <EOS> Dogs love bones Dogs love bones <EOS> Source Sequence Target Sequence Source: arXiv 1409.3215v3
  • 17. Recurrent encoder-decoders Les chiens aiment les os <EOS> Dogs love bones Dogs love bones <EOS> Source: arXiv 1409.3215v3
  • 18. Recurrent encoder-decoders Leschiensaimentlesos <EOS> Dogs love bones Dogs love bones <EOS> Source: arXiv 1409.3215v3
  • 19. Source: arXiv 1409.3215v3 Recurrent encoder-decoders - issues ● Difficult to cope with large sentences (longer than training corpus) ● Decoder w/ attention mechanism →relieve encoder to squash into fixed length vector
  • 20. NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN ANDTRANSLATE (2015) Source: arXiv 1409.0473v7 Decoder Context vector for each target word Weights of each annotation hj
  • 21. NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN ANDTRANSLATE (2015) Source: arXiv 1409.0473v7 Decoder Context vector for each target word Weights of each annotation hj Non-monotonic alignment
  • 22. Attention models for NLP Source: arXiv 1409.0473v7 Les chiens aiment les os <EOS> + <EOS>
  • 23. Attention models for NLP Source: arXiv 1409.0473v7 Les chiens aiment les os <EOS> + <EOS> Dogs
  • 24. Attention models for NLP Source: arXiv 1409.0473v7 Les chiens aiment les os <EOS> + <EOS> Dogs Dogs love +
  • 25. Attention models for NLP Source: arXiv 1409.0473v7 Les chiens aiment les os <EOS> + <EOS> Dogs Dogs love + love bones+
  • 26. Challenges in using the model ● Cannot handle true variable size input Source: http://suriyadeepan.github.io/ PADDING BUCKETING WORD EMBEDDINGS ● Capture context semantic meaning ● Hard to deal with both short and large sentences
  • 27. padding Source: http://suriyadeepan.github.io/ EOS : End of sentence PAD : Filler GO : Start decoding UNK : Unknown; word not in vocabulary Q : "What time is it? " A : "It is seven thirty." Q : [ PAD, PAD, PAD, PAD, PAD, “?”, “it”,“is”, “time”, “What” ] A : [ GO, “It”, “is”, “seven”, “thirty”, “.”, EOS, PAD, PAD, PAD ]
  • 28. Source: https://www.tensorflow.org/ bucketing Efficiently handle sentences of different lengths Ex: 100 tokens is the largest sentence in corpus How about short sentences like: "How are you?" → lots of PAD Bucket list: [(5, 10), (10, 15), (20, 25), (40, 50)] (defaut on Tensorflow translate.py) Q : [ PAD, PAD, “.”, “go”,“I”] A : [GO "Je" "vais" "." EOS PAD PAD PAD PAD PAD]
  • 29. Word embeddings (remember previous presentation ;-) Distributed representations → syntactic and semantic is captured Take = 0.286 0.792 -0.177 -0.107 0.109 -0.542 0.349 0.271
  • 30. Word embeddings (remember previous presentation ;-) Linguistic regularities (recap)
  • 31. Phrase representations(Paper - earning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation) Source: arXiv 1406.1078v3
  • 32. Phrase representations(Paper - earning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation) Source: arXiv 1406.1078v3 1000d vector representation
  • 34. Neural conversational model - chatbots Source: arXiv 1506.05869v3
  • 36. Google Smart reply Source: arXiv 1606.04870v1 Interesting facts ● Currently responsible for 10% Inbox replies ● Training set 238 million messages
  • 37. Google Smart reply Source: arXiv 1606.04870v1 Seq2Seq model Interesting facts ● Currently responsible for 10% Inbox replies ● Training set 238 million messages Feedforward triggering model Semi-supervised semantic clustering
  • 38. Image captioning(Paper - Show and Tell: A Neural Image Caption Generator) Source: arXiv 1411.4555v2
  • 39. Image captioning(Paper - Show and Tell: A Neural Image Caption Generator) Encoder Decoder Source: arXiv 1411.4555v2
  • 41. Multi-task sequence to sequence(Paper - MULTI-TASK SEQUENCE TO SEQUENCE LEARNING) Source: arXiv 1511.06114v4 One-to-Many (common encoder) Many-to-One (common decoder) Many-to-Many
  • 42. Neural programmer(Paper - NEURAL PROGRAMMER: INDUCING LATENT PROGRAMS WITH GRADIENT DESCENT) Source: arXiv 1511.04834v3
  • 43. Unsupervised pre-training for seq2seq - 2017(Paper - UNSUPERVISED PRETRAINING FOR SEQUENCE TO SEQUENCE LEARNING) Source: arXiv 1611.02683v1
  • 44. Unsupervised pre-training for seq2seq - 2017(Paper - UNSUPERVISED PRETRAINING FOR SEQUENCE TO SEQUENCE LEARNING) Source: arXiv 1611.02683v1 Pre-trained Pre-trained
  • 46. Place your screenshot here A Quick example on tensorflow