SlideShare a Scribd company logo
Convolutional Neural Networks
and Natural Language Processing
Thomas Delteil – github.com/thomasdelteil – linkedin.com/in/thomasdelteil
Applied Scientist @ AWS Deep Engine
Goals
§ Explain what convolutions are
§ Show how to handle textual data
§ Analyze a reference neural network
architecture for text classification
§ Demonstrate how to train and deploy a CNN for
Natural Language Processing
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Convolutions
And where to find them
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
2012 - ImageNet Classification with Deep Convolutional Neural Networks
ImageNet classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E.
Hinton, Advances in Neural Information Processing Systems, 2012
AlexNet architecture
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
ImageNet competition
Classify images among 1000 classes:
AlexNet Top-5 error-rate, 25% => 16%!
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Actual photo of the reaction from the computer vision community*
*might just be a stock photo
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
I told you
so!
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
What made Convolutional Neural Networks viable?
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
GPUs!
- Nvidia V100, float16 Ops:
~ 120 TFLOPS, 5000+ cuda cores
- #1 Super computer 2005 ~135 TFLOPS
Source: Mathworks
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Sea/Land segmentation via satellite images
DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation, Ruirui Li et al, 2017
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Automatic Galaxy classication
Deep Galaxy: Classification of Galaxies based on Deep Convolutional Neural Networks , Nour Eldeen M. Khalifa, 2017
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Medical Imaging, MRI, X-ray, surgical cameras
Review of MRI-based Brain Tumor Image Segmentation Using Deep Learning Methods, Ali Isn et al. 2016
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
What is a convolution ?
It is the cross-channel sum of the element-wise
multiplication of a convolutional filter (kernel/mask)
computed over a sliding window on an input tensor
given a certain stride and padding, plus a bias term.
The result is called a feature map.
2 2 1
3 1 -1
4 3 2
1 -1
-1 0
Input matrix (3x3)
no padding
1 channel
Kernel (2x2)
Stride 1
Bias = 2
Feature map (2x2)
-1 2
0 1
1*2 –1*2 –1*3 + 0*1 + 2 = – 1
1*2 –1*2 –1*1 + 0*-1 + 2. = 2
1*3 –1*1 –1*4 + 0*3 + 2 = 0
1*1 – (-1)*1 –1*3 + 0*2 + 2 = 1
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
What is a convolution ? Padding
Source: Machine Learning guru - Neural Networks CNN
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
What is a convolution ? Stride = 2
Source: Machine Learning guru - Neural Networks CNN
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
What is a convolution ? Multi Channel
1 convolutional filter
(3)x(3x3)
Source: Machine Learning guru - Neural Networks CNN
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
What is a convolution ? Multi Channel
source: Convolutional Neural Networks on the iphone with vggnet
N: Number of input channels
W:Width of the kernel
H: Height of the kernel
M: Number of output channels
Kernel size = ! ∗ # ∗ $
#Params = % ∗ ! ∗ # ∗ $ + %
256 convolutions of kernel (3,3) on 256 input channels
256*256*3*3 = ~0.5M
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Easily parallelizable
Convolution computations are:
- Independent (across filters and within
filter)
- Simple (multiplication and sums)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Why does it work?
Sharpening filter
Laplacian filter
Sobel x-axis filter
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Why does it work?
- Detect patterns at larger and larger scale by stacking convolution
layers on top of each others to grow the receptive field
- Applicable to spatially correlated data
Source: AlexNet first 96 (55x55) filters learned represented in RGB
space (3 input channels)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Growing receptive field
Source: ML Review, A guide to receptive field arithmetic
Deeper in the
network
Visualize convolutions
http://scs.ryerson.ca/~aharley/vis/conv/flat.html
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Visualize convolutions
Source: Neural Network 3D Simulation
(warning flashing lights)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
State of the art networks are getting deeper and more complex
Source: Inception v3
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
input
Learn Data Science – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
High number of parameters => Requires a lot of data to train
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Advanced type of convolutions
Source: An introduction to different types of convolutions
Transposed Convolutions
(deconvolution)
EnhanceNet
Dilated Convolutions
WaveNet
Depth-wise separable
Convolutions
MobileNet
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
On to Natural Language
Processing
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
NLP
Machine
translation
OCR
Q&A
Sentiment
Analysis
Speech
Recognition
TTS
Topic
Modelling
Information
Retrieval
Natural
Language
Understanding
Document
Classification
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
NLP Domains
8.4PB
of information per second
as of 2020
source: business2comunity, 2016
70%
of companies
use customer feedback
Source: business2comunity, 2016
£1.3Tvalue of company
data
source: IDC, 2014
10%
of organizations expect to
commercialise their data by 2020
source: Gartner, 2016
NLP Industry Facts
Source: Ticary, What is natural language processing Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Convolutions and Natural Language Processing
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Data Representation
?
source: Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn,and Dong Yu,. Classification Convolutional Neural Networks for
Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Encoding Data word-level
- Word-level embedding (word2vec). Word -> N-dimensional vector
Source: Convolutional Neural Networks for Sentence Classification,Yoon Kim, 2014
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
N
time
different
embeddings
V A N C O U V E R N L P …
_ 0 0 0 0 0 0 0 0 0 1 0 0 0
- 0 0 0 0 0 0 0 0 0 0 0 0 0
. 0 0 0 0 0 0 0 0 0 0 0 0 0
A 0 1 0 0 0 0 0 0 0 0 0 0 0
B 0 0 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 1 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 1 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 0 0 0 0
K 0 0 0 0 0 0 0 0 0 0 0 0 0
L 0 0 0 0 0 0 0 0 0 0 0 1 0
M 0 0 0 0 0 0 0 0 0 0 0 0 0
N 0 0 1 0 0 0 0 0 0 0 1 0 0
O 0 0 0 0 1 0 0 0 0 0 0 0 0
P 0 0 0 0 0 0 0 0 0 0 0 0 1
Q 0 0 0 0 0 0 0 0 0 0 0 0 0
R 0 0 0 0 0 0 0 0 1 0 0 0 0
S 0 0 0 0 0 0 0 0 0 0 0 0 0
T 0 0 0 0 0 0 0 0 0 0 0 0 0
U 0 0 0 0 0 1 0 0 0 0 0 0 0
V 1 0 0 0 0 0 1 0 0 0 0 0 0
W 0 0 0 0 0 0 0 0 0 0 0 0 0
X 0 0 0 0 0 0 0 0 0 0 0 0 0
Y 0 0 0 0 0 0 0 0 0 0 0 0 0
Z 0 0 0 0 0 0 0 0 0 0 0 0 0
Encoding Data – Character-level
- One-hot encoding
- Alphabet
- Sparse representation
- Character embedding
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Text classification, N categories
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Text classification, N categories
Neural
Network
- Fiction: 0%
- Biography: 6%
…
- Play: 80%
…
- Documentation: 0%
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
source: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. NIPS 2015
Visualization with Netro
Deep Neural Network: Crepe Model
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Visualization with Netron
Intuition: convolutions act similarly as n-grams
V A N C O U V E R … 1013
_ 0 0 0 0 0 0 0 0 0 1
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
- 0 0 0 0 0 0 0 0 0 0 …
. 0 0 0 0 0 0 0 0 0 0 …
A 0 1 0 0 0 0 0 0 0 0 …
B 0 0 0 0 0 0 0 0 0 0 …
C 0 0 0 1 0 0 0 0 0 0 …
D 0 0 0 0 0 0 0 0 0 0 …
E 0 0 0 0 0 0 0 1 0 0 …
F 0 0 0 0 0 0 0 0 0 0 …
G 0 0 0 0 0 0 0 0 0 0 …
H 0 0 0 0 0 0 0 0 0 0 …
I 0 0 0 0 0 0 0 0 0 0 …
J 0 0 0 0 0 0 0 0 0 0 …
K 0 0 0 0 0 0 0 0 0 0 …
L 0 0 0 0 0 0 0 0 0 0 …
M 0 0 0 0 0 0 0 0 0 0 …
N 0 0 1 0 0 0 0 0 0 0 …
O 0 0 0 0 1 0 0 0 0 0 …
P 0 0 0 0 0 0 0 0 0 0 …
Q 0 0 0 0 0 0 0 0 0 0 …
R 0 0 0 0 0 0 0 0 1 0 …
S 0 0 0 0 0 0 0 0 0 0 …
T 0 0 0 0 0 0 0 0 0 0 …
U 0 0 0 0 0 1 0 0 0 0 …
V 1 0 0 0 0 0 1 0 0 0 …
W 0 0 0 0 0 0 0 0 0 0 …
X 0 0 0 0 0 0 0 0 0 0 …
Y 0 0 0 0 0 0 0 0 0 0 …
Z 0 0 0 0 0 0 0 0 0 0 …
0 1 2 3 4 … … … … … … … … 1007
0 6.4 1.1 3.2 0.3 -0.4 … … … … … … … … …
1 -2.1 0.2 -3.4 … … … … … … … … … … …
… … … … … … … … … … … … … … …
… … … … … … … … … … … … … … …
… … … … … … … … … … … … … … …
254 … … … … … … … … … … … … … …
255 1.2 3.4 -1 1.2 3.2 … … … … … … … … …
x 256
69x1014x1 = ~70k
1x1008x256 = ~256k
x 1008
Temporal Convolution (256 69*7/1)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
1x1008x256 = ~256k
1x1008x256 = ~ 256k
Activation Function: Rectified Linear Unit (ReLU)
! " = $
", " ≥ 0
0, " < 0
0 1 2 3 4 5 … 1007
0 6.4 1.1 3.2 0.3 -0.4 0.2 … …
… … … … … … … … …
255 1.2 3.4 -1 1.2 3.2 2.8 … …
0 1 2 3 4 5 … 1007
0 6.4 1.1 3.2 0.3 0 0.2 … …
… … … … … … … … …
255 1.2 3.4 0 1.2 3.2 2.8 … …
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
0 1 2 3 4 5 … 1007
0 6.4 1.1 3.2 0.3 0 0.2 … …
… … … … … … … … …
255 1.2 3.4 0 1.2 3.2 2.8 … …
0 1 … 335
0 6.4 0.3 … …
… … … … …
255 3.4 3.2 … …
1x1008x256 = ~256k
1x336x256 = ~86k
x 336
x 256
Down-sampling: Max-Pooling (256 1*3/3)
source : Stanford's CS231n
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Fast forward…
1x336x256 = ~86k <- after 1 convolution layer (69*7/1) and 1 max pooling (3x1/3)
1x330x256 = ~85k <- after 1 convolution layer (1*7/1)
1x110x256 = ~28k <- 1 max-pooling (1*3/3)
3x102x256 = ~26k <- 4 convolutions layers (1*3/1)
1x34x256 = ~9k <- 1 max-pooling (1*3/3)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
0 1 2 3 4 5 6 7 8 … 33
0 6.4 0.1 … … … … … … … … …
1 2.1 24.9 … … … … … … … … …
… … … … … … … … … … … …
255 … … … … … … … … … … 9.9
0
0 6.4
1 0.1
… …
34 2.1
35 24.9
… …
… …
… …
8703 9.9 8704x1x1 = ~9k
1x34x256 = ~9k
x 256
Flattening Layer
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
0
0 6.4
1 0.1
… …
8703 9.9
8704x1x1 = ~9k
0
1
k
1023
x 1024
1024x1x1 = ~1k
!" # = %
&'(
)*(+
,"& ∗ .& + 0"
0
0 8.7
1 -2.1
… …
1023 32.1
Fully Connected / Dense layer (1024)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
0
0 8.7
1 0
… …
… …
… …
… …
… …
… …
… …
1023 32.1
DROP OUT
1024x1x1 = ~1k
0
1
k
1023
x 1024
1024x1x1 = ~1k
!" # = %
&'(
)*(+
,"& ∗ .& + 0"
0
0 9.2
1 5.3
… …
1023 0.1
ignored
Dropout (p=0.5) + Fully Connected Layer (1024)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
0
0 6.4
1 0.1
… …
… …
… …
… …
… …
… …
… …
1023 9.9
1024x1x1 = ~1k
0
…
N-1
x N
Nx1x1 = N
0
0 2.7
1 0.1
… …
… …
N-1 12.5
ignored
Softmax
0
0 0.1
1 0.01
… …
… …
N-1 0.8
Nx1x1 = N
!"#$%&' ( ) =
+,-
∑/01
234 +
,/
Output: Dropout + Dense + Softmax for N categories
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Text classification, N categories
Neural
Network
- Fiction: 0%
- Biography: 6%
…
- Play: 6%
…
- Documentation: 80%
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
How to train the network? Backward propagation!
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Backward propagation – Efficient Gradient Descent
- Fiction: 0%
- Biography: 6% 0%
…
- Play: 6% 100%
…
- Documentation: 80% 0%
- Fiction: 0%
- Biography: 6%
…
- Play: 6%
…
- Documentation: 80%
Update the weights of the convolutional masks and fully
connected units so that the error will be minimized next time
Neural
Network
!"# = !"# − &.
()
(*+,
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Learning Rate ! : How much to update the weights for every batch of documents?
Training Parameters: Learning Rate
Source:Towards data Science: Gradient descent in a nutshell
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Training parameters: Batch Size
Batch size: How many examples to learn from in one step?
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Training parameters: Number of epochs
Number of epochs: How many times should we feed the network the entire training set?
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Jupyter notebook demo – Crepe in Apache MXNet/Gluon
https://github.com/ThomasDelteil/CNN_NLP_MXNet
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Results
Traditional approaches
Word-level CNN
Character-level CNN
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
For images
For text
Humans to rephrase the examples
Synonyms
Similar semantic meaning
Data Augmentation
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Data Augmentation
The quick brown fox jumps over the lazy dog
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Data Augmentation
The quick brown fox jumps over the lazy dog
fast
swift
speedy
idle
indolent
slothful
hound
pup
mutt
leaps
springs
bounds
hops
hazel
brunette
chestnut
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Data Augmentation
The quick brown fox jumps over the lazy dog
fast
swift
speedy
idle
indolent
slothful
hound
pup
mutt
leaps
springs
bounds
hops
hazel
brunette
chestnut
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Data Augmentation
The quick brown fox jumps over the lazy dog
fast
swift
speedy
idle
indolent
slothful
hound
pup
mutt
leaps
springs
bounds
hops
The swift brunette fox leaps over the slothful pup
hazel
brunette
chestnut
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
You need a large dataset
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
…A very large dataset!
Live Demo – Classification of product category for Amazon Reviews
https://thomasdelteil.github.io/CNN_NLP_MXNet/
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
- Develop model using a Jupyter notebook
- Train model on GPU instance
- Package model behind web API in a Docker container, e.g using MXNet Model Server
- Upload container to container registry
- Deploy container to an elastic container service
- Enjoy quick and linear scaling
- Put the API behind a load balancer with SSL termination
- Enjoy J
Workflow and Operationalization
Elastic
Container
Service
GPU instance Container
Registry
Auto-scaling Load
Balancer
Container
HTTPS request
“Loved this
book”
HTTPS response
{
“prediction” : {
“book”: 0.99
}
}
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Advanced use-cases for
Convolutions and NLP
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
CNN + LSTM: Spatially and Temporally Deep Neural Networks
- CNN for feature extraction
- LSTM for temporal representation
Applications:
- Video (CNN for frames, LSTM to
combine them temporally)
- Text tasks
- Audio (Language detection)
Source: Combining CNN and RNN for spoken language detection
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Advanced use-case: Speech Generation WaveNet
Source: DeepMind Wavenet generative model raw audio
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
WaveNet: Dilated Causal Convolution
Source: DeepMind Wavenet generative model raw audio
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
WaveNet: Dilated Causal Convolution
Source: DeepMind Wavenet generative model raw audio
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Summary
§ Learned about convolutions
§ Applied them to textual data
§ Studied the crepe architecture from
Zhang et al. in details
§ Learned about advanced use cases and
operationalization
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Thank you!
Connect here
github.com/thomasdelteil
linkedin.com/in/thomasdelteil
tdelteil@amazon.com
Photos credits: https://pexels.com and https://unsplash.com/

More Related Content

PPTX
Word embedding
ShivaniChoudhary74
 
PPTX
Language models
Maryam Khordad
 
PDF
Word2Vec
hyunyoung Lee
 
PDF
Natural Language Processing (NLP)
Yuriy Guts
 
PDF
Natural language processing
National Institute of Technology Durgapur
 
PPTX
NLP
guestff64339
 
PPT
Natural Language Processing
Ila Group
 
PPTX
Natural language processing
Yogendra Tamang
 
Word embedding
ShivaniChoudhary74
 
Language models
Maryam Khordad
 
Word2Vec
hyunyoung Lee
 
Natural Language Processing (NLP)
Yuriy Guts
 
Natural language processing
National Institute of Technology Durgapur
 
Natural Language Processing
Ila Group
 
Natural language processing
Yogendra Tamang
 

What's hot (20)

PDF
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
PDF
Natural Language Processing
Jaganadh Gopinadhan
 
PPTX
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
PDF
Recurrent Neural Networks, LSTM and GRU
ananth
 
PPTX
A Simple Introduction to Word Embeddings
Bhaskar Mitra
 
PPT
rnn BASICS
Priyanka Reddy
 
PDF
Natural Language Processing
Toine Bogers
 
PPTX
Natural Language Processing
Adarsh Saxena
 
PPTX
Introduction to natural language processing (NLP)
Alia Hamwi
 
PPT
Natural Language Processing
Yasir Khan
 
PPTX
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
David Talby
 
PPTX
Tutorial on Question Answering Systems
Saeedeh Shekarpour
 
PDF
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
PPTX
Recurrent Neural Network
Mohammad Sabouri
 
PPTX
Natural Language Processing in AI
Saurav Shrestha
 
PPTX
Natural lanaguage processing
gulshan kumar
 
PDF
Deep learning - A Visual Introduction
Lukas Masuch
 
PPTX
Natural Language Processing
Saurabh Kaushik
 
PDF
Rnn and lstm
Shreshth Saxena
 
PDF
Transformers
Anup Joseph
 
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
Natural Language Processing
Jaganadh Gopinadhan
 
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
Recurrent Neural Networks, LSTM and GRU
ananth
 
A Simple Introduction to Word Embeddings
Bhaskar Mitra
 
rnn BASICS
Priyanka Reddy
 
Natural Language Processing
Toine Bogers
 
Natural Language Processing
Adarsh Saxena
 
Introduction to natural language processing (NLP)
Alia Hamwi
 
Natural Language Processing
Yasir Khan
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
David Talby
 
Tutorial on Question Answering Systems
Saeedeh Shekarpour
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
Recurrent Neural Network
Mohammad Sabouri
 
Natural Language Processing in AI
Saurav Shrestha
 
Natural lanaguage processing
gulshan kumar
 
Deep learning - A Visual Introduction
Lukas Masuch
 
Natural Language Processing
Saurabh Kaushik
 
Rnn and lstm
Shreshth Saxena
 
Transformers
Anup Joseph
 
Ad

Similar to Convolutional Neural Networks and Natural Language Processing (20)

PDF
Practical Deep Learning Using Tensor Flow - Sandeep Kath
Sandeep Kath
 
PPTX
Introduction to CNN
Shuai Zhang
 
PPTX
Deep learning
Aman Kamboj
 
PDF
C4_W2.pdf
machine121
 
PDF
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
PPT
Introduction to Deep-Learning-CNN Arch.ppt
khandarevaibhav
 
PDF
Convolutional neural network
Yan Xu
 
PPTX
Introduction_to_Deep_learning_Standford_university by Angelica Sun
ssuser36b130
 
PDF
Talk Norway Aug2016
xavierbresson
 
PDF
dl-unit-4-deep-learning deep-learning.pdf
nandan543979
 
PPTX
Deep Computer Vision - 1.pptx
JawadHaider36
 
PPTX
BASIC CONCEPT OF DEEP LEARNING.pptx
RiteshPandey184067
 
PDF
Convolutional_neural_network mechanism.pptx.pdf
SwathiSoman5
 
PPTX
Illustrative Introductory CNN
YasutoTamura1
 
PDF
CNN Algorithm
georgejustymirobi1
 
PPTX
Deep Learning
Pierre de Lacaze
 
PPTX
Convolutional Neural Networks for Computer vision Applications
Alex Conway
 
PPT
Deep Learning approach in Machine learning
vipulkondekar
 
PPTX
Convolution Neural Network Lecture Slides
AdnanHaider234505
 
PDF
convolutional neural network and its applications.pdf
SubhamKumar3239
 
Practical Deep Learning Using Tensor Flow - Sandeep Kath
Sandeep Kath
 
Introduction to CNN
Shuai Zhang
 
Deep learning
Aman Kamboj
 
C4_W2.pdf
machine121
 
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Introduction to Deep-Learning-CNN Arch.ppt
khandarevaibhav
 
Convolutional neural network
Yan Xu
 
Introduction_to_Deep_learning_Standford_university by Angelica Sun
ssuser36b130
 
Talk Norway Aug2016
xavierbresson
 
dl-unit-4-deep-learning deep-learning.pdf
nandan543979
 
Deep Computer Vision - 1.pptx
JawadHaider36
 
BASIC CONCEPT OF DEEP LEARNING.pptx
RiteshPandey184067
 
Convolutional_neural_network mechanism.pptx.pdf
SwathiSoman5
 
Illustrative Introductory CNN
YasutoTamura1
 
CNN Algorithm
georgejustymirobi1
 
Deep Learning
Pierre de Lacaze
 
Convolutional Neural Networks for Computer vision Applications
Alex Conway
 
Deep Learning approach in Machine learning
vipulkondekar
 
Convolution Neural Network Lecture Slides
AdnanHaider234505
 
convolutional neural network and its applications.pdf
SubhamKumar3239
 
Ad

Recently uploaded (20)

PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PDF
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
short term internship project on Data visualization
JMJCollegeComputerde
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 

Convolutional Neural Networks and Natural Language Processing

  • 1. Convolutional Neural Networks and Natural Language Processing Thomas Delteil – github.com/thomasdelteil – linkedin.com/in/thomasdelteil Applied Scientist @ AWS Deep Engine
  • 2. Goals § Explain what convolutions are § Show how to handle textual data § Analyze a reference neural network architecture for text classification § Demonstrate how to train and deploy a CNN for Natural Language Processing Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 3. Convolutions And where to find them Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 4. 2012 - ImageNet Classification with Deep Convolutional Neural Networks ImageNet classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, Advances in Neural Information Processing Systems, 2012 AlexNet architecture Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 5. ImageNet competition Classify images among 1000 classes: AlexNet Top-5 error-rate, 25% => 16%! Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 6. Actual photo of the reaction from the computer vision community* *might just be a stock photo Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 7. I told you so! Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 8. What made Convolutional Neural Networks viable? Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 9. GPUs! - Nvidia V100, float16 Ops: ~ 120 TFLOPS, 5000+ cuda cores - #1 Super computer 2005 ~135 TFLOPS Source: Mathworks Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 10. Sea/Land segmentation via satellite images DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation, Ruirui Li et al, 2017 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 11. Automatic Galaxy classication Deep Galaxy: Classification of Galaxies based on Deep Convolutional Neural Networks , Nour Eldeen M. Khalifa, 2017 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 12. Medical Imaging, MRI, X-ray, surgical cameras Review of MRI-based Brain Tumor Image Segmentation Using Deep Learning Methods, Ali Isn et al. 2016 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 13. What is a convolution ? It is the cross-channel sum of the element-wise multiplication of a convolutional filter (kernel/mask) computed over a sliding window on an input tensor given a certain stride and padding, plus a bias term. The result is called a feature map. 2 2 1 3 1 -1 4 3 2 1 -1 -1 0 Input matrix (3x3) no padding 1 channel Kernel (2x2) Stride 1 Bias = 2 Feature map (2x2) -1 2 0 1 1*2 –1*2 –1*3 + 0*1 + 2 = – 1 1*2 –1*2 –1*1 + 0*-1 + 2. = 2 1*3 –1*1 –1*4 + 0*3 + 2 = 0 1*1 – (-1)*1 –1*3 + 0*2 + 2 = 1 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 14. What is a convolution ? Padding Source: Machine Learning guru - Neural Networks CNN Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 15. What is a convolution ? Stride = 2 Source: Machine Learning guru - Neural Networks CNN Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 16. What is a convolution ? Multi Channel 1 convolutional filter (3)x(3x3) Source: Machine Learning guru - Neural Networks CNN Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 17. What is a convolution ? Multi Channel source: Convolutional Neural Networks on the iphone with vggnet N: Number of input channels W:Width of the kernel H: Height of the kernel M: Number of output channels Kernel size = ! ∗ # ∗ $ #Params = % ∗ ! ∗ # ∗ $ + % 256 convolutions of kernel (3,3) on 256 input channels 256*256*3*3 = ~0.5M Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 18. Easily parallelizable Convolution computations are: - Independent (across filters and within filter) - Simple (multiplication and sums) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 19. Why does it work? Sharpening filter Laplacian filter Sobel x-axis filter Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 20. Why does it work? - Detect patterns at larger and larger scale by stacking convolution layers on top of each others to grow the receptive field - Applicable to spatially correlated data Source: AlexNet first 96 (55x55) filters learned represented in RGB space (3 input channels) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 21. Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil Growing receptive field Source: ML Review, A guide to receptive field arithmetic Deeper in the network
  • 22. Visualize convolutions http://scs.ryerson.ca/~aharley/vis/conv/flat.html Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 23. Visualize convolutions Source: Neural Network 3D Simulation (warning flashing lights) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 24. State of the art networks are getting deeper and more complex Source: Inception v3 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil input
  • 25. Learn Data Science – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil High number of parameters => Requires a lot of data to train Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 26. Advanced type of convolutions Source: An introduction to different types of convolutions Transposed Convolutions (deconvolution) EnhanceNet Dilated Convolutions WaveNet Depth-wise separable Convolutions MobileNet Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 27. On to Natural Language Processing Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 28. NLP Machine translation OCR Q&A Sentiment Analysis Speech Recognition TTS Topic Modelling Information Retrieval Natural Language Understanding Document Classification Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil NLP Domains
  • 29. 8.4PB of information per second as of 2020 source: business2comunity, 2016 70% of companies use customer feedback Source: business2comunity, 2016 £1.3Tvalue of company data source: IDC, 2014 10% of organizations expect to commercialise their data by 2020 source: Gartner, 2016 NLP Industry Facts Source: Ticary, What is natural language processing Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 30. Convolutions and Natural Language Processing Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 31. Data Representation ? source: Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn,and Dong Yu,. Classification Convolutional Neural Networks for Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 32. Encoding Data word-level - Word-level embedding (word2vec). Word -> N-dimensional vector Source: Convolutional Neural Networks for Sentence Classification,Yoon Kim, 2014 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil N time different embeddings
  • 33. V A N C O U V E R N L P … _ 0 0 0 0 0 0 0 0 0 1 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 A 0 1 0 0 0 0 0 0 0 0 0 0 0 B 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 1 0 0 0 0 0 0 0 0 0 D 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 0 1 0 0 0 0 0 F 0 0 0 0 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 0 0 0 0 0 0 0 0 H 0 0 0 0 0 0 0 0 0 0 0 0 0 I 0 0 0 0 0 0 0 0 0 0 0 0 0 J 0 0 0 0 0 0 0 0 0 0 0 0 0 K 0 0 0 0 0 0 0 0 0 0 0 0 0 L 0 0 0 0 0 0 0 0 0 0 0 1 0 M 0 0 0 0 0 0 0 0 0 0 0 0 0 N 0 0 1 0 0 0 0 0 0 0 1 0 0 O 0 0 0 0 1 0 0 0 0 0 0 0 0 P 0 0 0 0 0 0 0 0 0 0 0 0 1 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 R 0 0 0 0 0 0 0 0 1 0 0 0 0 S 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 0 U 0 0 0 0 0 1 0 0 0 0 0 0 0 V 1 0 0 0 0 0 1 0 0 0 0 0 0 W 0 0 0 0 0 0 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 Encoding Data – Character-level - One-hot encoding - Alphabet - Sparse representation - Character embedding Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 34. Text classification, N categories Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 35. Text classification, N categories Neural Network - Fiction: 0% - Biography: 6% … - Play: 80% … - Documentation: 0% Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 36. source: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. NIPS 2015 Visualization with Netro Deep Neural Network: Crepe Model Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil Visualization with Netron Intuition: convolutions act similarly as n-grams
  • 37. V A N C O U V E R … 1013 _ 0 0 0 0 0 0 0 0 0 1 … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … - 0 0 0 0 0 0 0 0 0 0 … . 0 0 0 0 0 0 0 0 0 0 … A 0 1 0 0 0 0 0 0 0 0 … B 0 0 0 0 0 0 0 0 0 0 … C 0 0 0 1 0 0 0 0 0 0 … D 0 0 0 0 0 0 0 0 0 0 … E 0 0 0 0 0 0 0 1 0 0 … F 0 0 0 0 0 0 0 0 0 0 … G 0 0 0 0 0 0 0 0 0 0 … H 0 0 0 0 0 0 0 0 0 0 … I 0 0 0 0 0 0 0 0 0 0 … J 0 0 0 0 0 0 0 0 0 0 … K 0 0 0 0 0 0 0 0 0 0 … L 0 0 0 0 0 0 0 0 0 0 … M 0 0 0 0 0 0 0 0 0 0 … N 0 0 1 0 0 0 0 0 0 0 … O 0 0 0 0 1 0 0 0 0 0 … P 0 0 0 0 0 0 0 0 0 0 … Q 0 0 0 0 0 0 0 0 0 0 … R 0 0 0 0 0 0 0 0 1 0 … S 0 0 0 0 0 0 0 0 0 0 … T 0 0 0 0 0 0 0 0 0 0 … U 0 0 0 0 0 1 0 0 0 0 … V 1 0 0 0 0 0 1 0 0 0 … W 0 0 0 0 0 0 0 0 0 0 … X 0 0 0 0 0 0 0 0 0 0 … Y 0 0 0 0 0 0 0 0 0 0 … Z 0 0 0 0 0 0 0 0 0 0 … 0 1 2 3 4 … … … … … … … … 1007 0 6.4 1.1 3.2 0.3 -0.4 … … … … … … … … … 1 -2.1 0.2 -3.4 … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 254 … … … … … … … … … … … … … … 255 1.2 3.4 -1 1.2 3.2 … … … … … … … … … x 256 69x1014x1 = ~70k 1x1008x256 = ~256k x 1008 Temporal Convolution (256 69*7/1) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 38. 1x1008x256 = ~256k 1x1008x256 = ~ 256k Activation Function: Rectified Linear Unit (ReLU) ! " = $ ", " ≥ 0 0, " < 0 0 1 2 3 4 5 … 1007 0 6.4 1.1 3.2 0.3 -0.4 0.2 … … … … … … … … … … … 255 1.2 3.4 -1 1.2 3.2 2.8 … … 0 1 2 3 4 5 … 1007 0 6.4 1.1 3.2 0.3 0 0.2 … … … … … … … … … … … 255 1.2 3.4 0 1.2 3.2 2.8 … … Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 39. 0 1 2 3 4 5 … 1007 0 6.4 1.1 3.2 0.3 0 0.2 … … … … … … … … … … … 255 1.2 3.4 0 1.2 3.2 2.8 … … 0 1 … 335 0 6.4 0.3 … … … … … … … 255 3.4 3.2 … … 1x1008x256 = ~256k 1x336x256 = ~86k x 336 x 256 Down-sampling: Max-Pooling (256 1*3/3) source : Stanford's CS231n Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 40. Fast forward… 1x336x256 = ~86k <- after 1 convolution layer (69*7/1) and 1 max pooling (3x1/3) 1x330x256 = ~85k <- after 1 convolution layer (1*7/1) 1x110x256 = ~28k <- 1 max-pooling (1*3/3) 3x102x256 = ~26k <- 4 convolutions layers (1*3/1) 1x34x256 = ~9k <- 1 max-pooling (1*3/3) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 41. 0 1 2 3 4 5 6 7 8 … 33 0 6.4 0.1 … … … … … … … … … 1 2.1 24.9 … … … … … … … … … … … … … … … … … … … … … 255 … … … … … … … … … … 9.9 0 0 6.4 1 0.1 … … 34 2.1 35 24.9 … … … … … … 8703 9.9 8704x1x1 = ~9k 1x34x256 = ~9k x 256 Flattening Layer Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 42. 0 0 6.4 1 0.1 … … 8703 9.9 8704x1x1 = ~9k 0 1 k 1023 x 1024 1024x1x1 = ~1k !" # = % &'( )*(+ ,"& ∗ .& + 0" 0 0 8.7 1 -2.1 … … 1023 32.1 Fully Connected / Dense layer (1024) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 43. 0 0 8.7 1 0 … … … … … … … … … … … … … … 1023 32.1 DROP OUT 1024x1x1 = ~1k 0 1 k 1023 x 1024 1024x1x1 = ~1k !" # = % &'( )*(+ ,"& ∗ .& + 0" 0 0 9.2 1 5.3 … … 1023 0.1 ignored Dropout (p=0.5) + Fully Connected Layer (1024) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 44. 0 0 6.4 1 0.1 … … … … … … … … … … … … … … 1023 9.9 1024x1x1 = ~1k 0 … N-1 x N Nx1x1 = N 0 0 2.7 1 0.1 … … … … N-1 12.5 ignored Softmax 0 0 0.1 1 0.01 … … … … N-1 0.8 Nx1x1 = N !"#$%&' ( ) = +,- ∑/01 234 + ,/ Output: Dropout + Dense + Softmax for N categories Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 45. Text classification, N categories Neural Network - Fiction: 0% - Biography: 6% … - Play: 6% … - Documentation: 80% Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 46. How to train the network? Backward propagation! Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 47. Backward propagation – Efficient Gradient Descent - Fiction: 0% - Biography: 6% 0% … - Play: 6% 100% … - Documentation: 80% 0% - Fiction: 0% - Biography: 6% … - Play: 6% … - Documentation: 80% Update the weights of the convolutional masks and fully connected units so that the error will be minimized next time Neural Network !"# = !"# − &. () (*+, Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 48. Learning Rate ! : How much to update the weights for every batch of documents? Training Parameters: Learning Rate Source:Towards data Science: Gradient descent in a nutshell Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 49. Training parameters: Batch Size Batch size: How many examples to learn from in one step? Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 50. Training parameters: Number of epochs Number of epochs: How many times should we feed the network the entire training set? Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 51. Jupyter notebook demo – Crepe in Apache MXNet/Gluon https://github.com/ThomasDelteil/CNN_NLP_MXNet Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 52. Results Traditional approaches Word-level CNN Character-level CNN Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 53. For images For text Humans to rephrase the examples Synonyms Similar semantic meaning Data Augmentation Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 54. Data Augmentation The quick brown fox jumps over the lazy dog Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 55. Data Augmentation The quick brown fox jumps over the lazy dog fast swift speedy idle indolent slothful hound pup mutt leaps springs bounds hops hazel brunette chestnut Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 56. Data Augmentation The quick brown fox jumps over the lazy dog fast swift speedy idle indolent slothful hound pup mutt leaps springs bounds hops hazel brunette chestnut Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 57. Data Augmentation The quick brown fox jumps over the lazy dog fast swift speedy idle indolent slothful hound pup mutt leaps springs bounds hops The swift brunette fox leaps over the slothful pup hazel brunette chestnut Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 58. You need a large dataset Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 59. …A very large dataset!
  • 60. Live Demo – Classification of product category for Amazon Reviews https://thomasdelteil.github.io/CNN_NLP_MXNet/ Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 61. - Develop model using a Jupyter notebook - Train model on GPU instance - Package model behind web API in a Docker container, e.g using MXNet Model Server - Upload container to container registry - Deploy container to an elastic container service - Enjoy quick and linear scaling - Put the API behind a load balancer with SSL termination - Enjoy J Workflow and Operationalization Elastic Container Service GPU instance Container Registry Auto-scaling Load Balancer Container HTTPS request “Loved this book” HTTPS response { “prediction” : { “book”: 0.99 } } Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 62. Advanced use-cases for Convolutions and NLP Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 63. CNN + LSTM: Spatially and Temporally Deep Neural Networks - CNN for feature extraction - LSTM for temporal representation Applications: - Video (CNN for frames, LSTM to combine them temporally) - Text tasks - Audio (Language detection) Source: Combining CNN and RNN for spoken language detection Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 64. Advanced use-case: Speech Generation WaveNet Source: DeepMind Wavenet generative model raw audio Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 65. WaveNet: Dilated Causal Convolution Source: DeepMind Wavenet generative model raw audio Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 66. WaveNet: Dilated Causal Convolution Source: DeepMind Wavenet generative model raw audio Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 67. Summary § Learned about convolutions § Applied them to textual data § Studied the crepe architecture from Zhang et al. in details § Learned about advanced use cases and operationalization Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 68. Thank you! Connect here github.com/thomasdelteil linkedin.com/in/thomasdelteil [email protected] Photos credits: https://pexels.com and https://unsplash.com/