SlideShare a Scribd company logo
Proprietary and confidential. Do not distribute.
Introduction to Deep Learning and Neon
MAKING MACHINES SMARTER.™
Kyle H. Ambert, PhD

Senior Data Scientist
May 25 , 2017th
@TheKyleAmbert
Nervana Systems Proprietary
About me & Intel’s Artificial Intelligence Products Group (AIPG)
+
Nervana Systems Proprietary
About me & Intel’s Artificial Intelligence Products Group (AIPG)
+
Nervana Systems Proprietary
About me & Intel’s Artificial Intelligence Products Group (AIPG)
+
Nervana Systems Proprietary
About me & Intel’s Artificial Intelligence Products Group (AIPG)
+
Nervana Systems Proprietary
About me & Intel’s Artificial Intelligence Products Group (AIPG)
+
Nervana Systems Proprietary
About me & Intel’s Artificial Intelligence Products Group (AIPG)
+
Together, we create production deep learning solutions in multiple
domains, while advancing the field of applied analytics and optimization.
Nervana Systems Proprietary
8
Intel’s Interest in Analytics
To provide the infrastructure
for the fastest time-to-insight
To create tools that enable
scientists to think about their
research, rather than their
process
To enable users to ask bigger
questions
Bigger Data Better Hardware Smarter Algorithms
Image: 1000 KB / picture
Audio: 5000 KB / song
Video: 5,000,000 KB / movie
Transistor density doubles
every 18 months
Cost / GB in 1995: $1000.00
Cost / GB in 2015: $0.03
Advances in neural
networks leading to better
accuracy in training models
Great solutions require great hardware!
Nervana Systems Proprietary
LIBRARIES Intel® MKL
Intel® MKL-DNN
FRAMEWORKS
Intel® DAAL
HARDWARE
Memory/Storage FabricCompute
Intel
Distribution
MORE
UNLEASHING
POTENTIAL
FULL
SOLUTIONS
PLATFORMS/TOOLS
BIGDL
Intel® Nervana™ Deep
Learning Platform
Intel® Nervana™
Cloud
Intel® Nervana™
Graph
Nervana Systems Proprietary
10
This Evening
1. Machine Learning and Data Science
2. Introduction to Deep Learning
3. Nervana!
4. Neon
5. Deep Learning Use Cases
Nervana Systems Proprietary
11
This Evening
1. Machine Learning and Data Science
2. Introduction to Deep Learning
3. Nervana!
4. Neon
5. Deep Learning Use Cases
Nervana Systems Proprietary
12
AI? Machine Learning? Deep Learning?
Machine learning is the development, and application of, algorithms that can
learn from data in an automated, semi-automated, or supervised setting.
Deep LearningStatistical Learning
Algorithms where multiple layers of neurons learn
successively complex representations of input data
CNN RNN DFF RBM LSTM
Algorithms which leverage statistical methods for
estimating functions from examples
Naïve
Bayes SVM GLM
Tree-
based kNN
Training: building a mathematical model based on input data
Classification (scoring): using a trained model to make predictions about new data
Machine learning is the development, and application of, algorithms that can
learn from data in an automated, semi-automated, or supervised setting.
Deep LearningStatistical Learning
Algorithms where multiple layers of neurons learn
successively complex representations of input data
CNN RNN DFF RBM LSTM
Algorithms which leverage statistical methods for
estimating functions from examples
Naïve
Bayes SVM GLM
Tree-
based kNN
Training: building a mathematical model based on input data
Classification (scoring): using a trained model to make predictions about new data
Machine learning is the development, and application of, algorithms that can
learn from data in an automated, semi-automated, or supervised setting.
Deep LearningStatistical Learning
Algorithms where multiple layers of neurons learn
successively complex representations of input data
CNN RNN DFF RBM LSTM
Algorithms which leverage statistical methods for
estimating functions from examples
Naïve
Bayes SVM GLM
Tree-
based kNN
Training: building a mathematical model based on input data
Classification (scoring): using a trained model to make predictions about new data
Ingest
Data
Engineer
Features
Structure

Model
Clean
Data
Visualize
Query/
Analyze
TrainM
odel
Deploy
Nervana Systems Proprietary
16
This Evening
1. Machine Learning and Data Science
2. Introduction to Deep Learning
3. Nervana!
4. Neon
5. Deep Learning Use Cases
Nervana Systems Proprietary
17
A Quite Brief History of Deep Learning
• 1960s: Neural networks used for binary classification
• 1970s: Neural networks popularity dries after not delivering on the hype
• 1980s: Backpropagation is used to train deep networks
• 1990s: Neural networks take the back seat to support vector machines due to the nice
theoretical properties and guarantee bounds
• 2010s: Access to large datasets and more computation allowed deep networks to return and
have state-of-the-art results in speech, vision, and natural language processing
• 1949: The Organization of Behavior is published
(Hebb!)
(Minsky)
Today: Deep Learning is a fast-moving area of academic and applied analytics!
There are many opportunities for new discoveries!
(Vapnik)
(Hinton)
Nervana Systems Proprietary
18
ML v. DL: Practical Differences
 
SVM
Random Forest
Naïve Bayes
Decision Trees
Logistic Regression
Ensemble methods
 
 
Harrison
Nervana Systems Proprietary
19
End-to-End Deep learning
~60 million parameters
Harrison
 
Nervana Systems Proprietary
20
Workflows in Machine Learning
⟹ The same rules apply for deep learning!
➝ Preprocessing data
➝ Feature extraction
➝ Parsimony in model selection
⟹ How we go about some of this does change…
Nervana Systems Proprietary
21
End-to-End Deep learning: Data Considerations
Nervana Systems Proprietary
22
End-to-End Deep learning: Data Considerations
Nervana Systems Proprietary
23
End-to-End Deep learning: Data Considerations
X X
X
XX
X
Labels: Harrison? Transformations! More data is always better!
Nervana Systems Proprietary
Deep Learning: Networks of Artificial Neurons
 
 
 
Output of unit
Activation Function
Linear weights Bias unit
Input from unit j
  
 
   
 
 
 
 
⟹ With an explosion of moving parts,
being able to understand and keep
track of what sort of model is being
built becomes even more important!
Nervana Systems Proprietary
Practical example: recognition of handwritten digits
MNIST dataset
70,000 images (28x28 pixels)
Goal: classify images into a digit 0-9
N = 28 x 28 pixels
= 784 input units
N = 10 output units (one
for each digit)
Each unit i encodes the
probability of the input
image of being of the
digit i
N = 100 hidden units
(user-defined
parameter)
Input
Hidden
Output
Nervana Systems Proprietary
Training procedure
Input
Hidden
Output 1. Randomly seed weights
2. Forward-pass
3. Cost
4. Backward-pass
5. Update weights
Nervana Systems Proprietary
Forward pass
0.0
0.1
0.0
0.3
0.1
0.1
0.0
0.0
0.4
0.0
Output (10x1)
Input
Hidden
Output
28x28
Nervana Systems Proprietary
Cost
0.0
0.1
0.0
0.3
0.1
0.1
0.0
0.0
0.4
0.0
Output (10x1)
28x28
Input
Hidden
Output
0
0
0
1
0
0
0
0
0
0
Ground Truth
Cost function
 
Nervana Systems Proprietary
Backward pass
0.0
0.1
0.0
0.3
0.1
0.1
0.0
0.0
0.4
0.0
Output (10x1)
Input
Hidden
Output
0
0
0
1
0
0
0
0
0
0
Ground Truth
Cost function
 
 ∆Wi→j
Nervana Systems Proprietary
Back-propagation
Input
Hidden
Output  
compute
Nervana Systems Proprietary
Back-propagation
Input
Hidden
Output
 
 
Nervana Systems Proprietary
Back-propagation
Input
Hidden
Output
 
 
=
 
 
 
a
! = max	((,0)
a
!′(()
Nervana Systems Proprietary
Back-propagation
Input
Hidden
Output
 
 
 
 
Nervana Systems Proprietary
Training
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
Nervana Systems Proprietary
Gradient descent
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
Update weights via:
 
Learning rate
Nervana Systems Proprietary
Stochastic (minibatch) Gradient descent
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
fprop cost bprop  
minibatch #1
weight update
minibatch #2
weight update
Nervana Systems Proprietary
Stochastic (minibatch) Gradient descent
Epoch 0
Epoch 1
Sample numbers:
• Learning rate ~0.001
• Batch sizes of 32-128
• 50-90 epochs
Nervana Systems Proprietary
Why Does This Work at All?
Krizhevsky, 2012
60 million parameters
120 million parameters
Taigman, 2014
Nervana Systems Proprietary
39
This Evening
1. Machine Learning and Data Science
2. Introduction to Deep Learning
3. Nervana!
4. Neon
5. Deep Learning Use Cases
Nervana Systems Proprietary
Nervana in 30 seconds. Possibly less.
40
neon deep
learning
framework
train deployexplore
nervana
engine
2-3x speedup on
Titan X GPUs
cloudn
Nervana Systems Proprietary
neon framework
Nervana Systems Proprietary
nervana cloud
Web Interface Command Line
Nervana Systems Proprietary
43
This Evening
1. Machine Learning and Data Science
2. Introduction to Deep Learning
3. Nervana!
4. Neon
5. Deep Learning Use Cases
Nervana Systems Proprietary
Ge(i)t Neon!
1. git clone https://github.com/NervanaSystems/neon.git
2. pip install {h5py, pyaml, virtualenv}
3. brew install {opencv|opencv3}
4. make {python2|python3}
5. . .venv/bin/activate
6. examples/mnist_mlp.py
7. deactivate
⟹ https://goo.gl/jZgfNg
Documentation!
Nervana Systems Proprietary
Deep learning ingredients
Dataset Model/Layers Activation OptimizerCost
 
Nervana Systems Proprietary
neon overview
Backend NervanaGPU, NervanaCPU, NervanaMGPU
Datasets
MNIST, CIFAR-10, Imagenet 1K, PASCAL VOC, Mini-Places2, IMDB, Penn Treebank,
Shakespeare Text, bAbI, Hutter-prize, UCF101, flickr8k, flickr30k, COCO
Initializers Constant, Uniform, Gaussian, Glorot Uniform, Xavier, Kaiming, IdentityInit, Orthonormal
Optimizers Gradient Descent with Momentum, RMSProp, AdaDelta, Adam, Adagrad,MultiOptimizer
Activations Rectified Linear, Softmax, Tanh, Logistic, Identity, ExpLin
Layers
Linear, Convolution, Pooling, Deconvolution, Dropout, Recurrent,Long Short-
Term Memory, Gated Recurrent Unit, BatchNorm, LookupTable,Local Response Normalizat
ion, Bidirectional-RNN, Bidirectional-LSTM
Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error
Metrics Misclassification (Top1, TopK), LogLoss, Accuracy, PrecisionRecall, ObjectDetection
Nervana Systems Proprietary
Curated Models
47
• https://github.com/NervanaSystems/ModelZoo
• Pre-trained weights and models
SegNet
Deep Speech 2
Skip-thought
Autoencoders
Deep Dream
Nervana Systems Proprietary
Neon workflow
1. Generate backend
2. Load data
3. Specify model architecture
4. Define training parameters
5. Train model
6. Evaluate
Nervana Systems Proprietary
Interacting with Neon
1. Via command line
2. In a virtual environment
3. In an ipython/jupyter notebook
4. ncloud
Nervana Systems Proprietary
Nervana Cloud
Nervana Systems Proprietary
Nervana Cloud
Nervana Systems Proprietary
Nervana Cloud
Nervana Systems Proprietary
53
This Evening
1. Machine Learning and Data Science
2. Introduction to Deep Learning
3. Nervana!
4. Neon
5. Deep Learning Use Cases
Nervana Systems Proprietary
54
Nervana Systems Proprietary
Nervana Systems Proprietary
•Layers: convolution, rectified linear units, pooling, dropout, softmax
•Popular with 2D + depth (+ time) inputs
•Gray or RBG images
•Videos
•Synthetic aperture radar
•Spectrogram (speech)
Nervana Systems Proprietary
•Layers: convolution, rectified linear units, pooling, dropout,
softmax
•Use multiple copies of the same feature on the input
(correlation)
•Use several features (aka kernels, filters)
•Reduces number of weights compared to fully connected
Nervana Systems Proprietary
•Layers: convolution, rectified linear units (ReLu),
pooling, dropout, softmax
•It is fast – no normalization or exponential computations
•Induces sparsity in the hidden units
 
Nervana Systems Proprietary
•Layers: convolution, rectified linear units, pooling, dropout, softmax
•Downsampling
•Reduces the number of parameters
•Provides some translation invariance
Nervana Systems Proprietary
•Layers: convolution, rectified linear units, pooling, dropout, softmax
•Reduces overfitting – Prevents co-adaptation on training data
Nervana Systems Proprietary
•Layers: convolution, rectified linear units, pooling, dropout, softmax
•aka “normalized exponential function”
•Normalizes vector to a probability distribution 
Nervana Systems Proprietary
Code!
Nervana Systems Proprietary
63
DEEP LEARNING USE CASES!
Long Short-Term Memory (LSTM)
Nervana Systems Proprietary
Why Recurrent Neural Networks?
Input
Hidden
Output
• Temporal dependencies
• Variable sequence length
• Independence
• Fixed Length
Nervana Systems Proprietary
Recurrent neuron
 
 
 
 
 
 
   
Nervana Systems Proprietary
RNN: what is it good for?
0.1
-0.4
0.6
1
0
0
0
0.1
0.7
0.1
0.1
-0.3
0.6
1.6
1
0
0
0
0.1
0.3
0.4
0.2
0.7
-0.4
-0.4
1
0
0
0
0.3
0.0
0.6
0.1
0.1
-0.8
0.1
1
0
0
0
0.0
0.0
0.2
0.8
“h” “e” “l” “l”
“e” “l” “l” “o”
 
Learned a language model!
Nervana Systems Proprietary
RNN: what is it good for?
0.1
-0.4
0.6
1
0
0
0
0.1
0.7
0.1
0.1
-0.3
0.6
1.6
1
0
0
0
0.1
0.3
0.4
0.2
0.7
-0.4
-0.4
1
0
0
0
0.4
0.0
0.5
0.1
0.1
-0.8
0.1
1
0
0
0
0.0
0.0
0.2
0.8
“cash” “flow” “is” “high”
“flow” “is” “high” “today”
 
Learned a language model!
“low”
“high”
Nervana Systems Proprietary
RNN: what is it good for?
0.1
-0.4
0.6
1
0
0
0
-0.3
0.6
1.6
0
1
0
0
0.7
-0.4
-0.4
0
0
1
0
0.1
-0.8
0.1
0
0
0
1
“this” “movie” “was” “bad”
NEGATIVE
“and” “long” <eos>
0.1
-0.8
0.1
1
0
0
0
0.7
-0.4
-0.4
1
0
0
0
-0.3
0.6
1.6
0
1
0
0
0.2
0.8
Nervana Systems Proprietary
RNN: what is it good for?
0.1
-0.4
0.6
1
0
0
0
-0.3
0.6
1.6
0
1
0
0
0.7
-0.4
-0.4
0
0
1
0
0.1
-0.8
0.1
“neon” “is” “amazing”
0.1
-0.8
0.1
0.7
-0.4
-0.4
-0.3
0.6
1.6
0.1
0.7
0.1
0.1
0.1
0.3
0.4
0.2
0.3
0.0
0.6
0.1
0.0
0.0
0.2
0.8
“neon” “est” “incroyable” “!”
0.1
-0.4
0.6
1
0
0
0
-0.3
0.6
1.6
0
1
0
0
0.7
-0.4
-0.4
0
0
1
0
0.1
-0.8
0.1
“neon” “is” “amazing”
0.1
-0.8
0.1
0.7
-0.4
-0.4
-0.3
0.6
1.6
0.1
0.7
0.1
0.1
0.1
0.3
0.4
0.2
0.3
0.0
0.6
0.1
0.0
0.0
0.2
0.8
“neon”“est”“incroyable”“!”
Nervana Systems Proprietary
Long-Short Term Memory (LSTM)
 
       
1 1
 
1
Manipulate memory cell:
1. “forget” (flush the memory)
2. “input” (add to memory)
3. “output” (get from memory)
Nervana Systems Proprietary
Example – Sentiment analysis with LSTM
“Okay, sorry, but I loved this movie. I just
love the whole 80’s genre of these kind
of movies, because you don’t see many
like this...” -~CupidGrl~
POSITIVE
The plot/writing is completely unrealistic and just dumb at
times. Bond is dressed up in a white tux on an overnight
train ride? eh, OK. But then they just show up at the
villain’s compound like nothing bad is going to happen to
them. How stupid is this Bond?
NEGATIVE
Nervana Systems Proprietary
Preprocessing
“Okay, sorry, but I loved this movie. I just
love the whole 80’s genre of these kind
of movies, because you don’t see many
like this...” -~CupidGrl~
[5, 4, 940, 107, 14, 672, 1790,
333, 47, 11, 7890, …,1]
Out-of-Vocab
(e.g. CupidGrl)
• Limit vocab size to 20,000 words
• Truncate each example to 128 words [from the left]
• Pad examples up to 128 whitespace
Nervana Systems Proprietary
Model
d=128
embedding layer
LSTM
LSTM
LSTM
LSTM
N=2
[5, 4, 940, 107,
14, 672, 1790,
333, 47, 11,
7890, …,1]
 
POS
NEG
N=64
LSTM AffineRecurrentSum
 
Nervana Systems Proprietary
Data flow
d=128
embedding layer
LSTM
(2, 1)
POS
NEG
LSTM Affine
    
LSTM LSTM LSTM
       
RecurrentSum
 
 
n=64
Nervana Systems Proprietary
Data flow in batches with neon
d=128
embedding layer
LSTM
(2, bsz)
[5, 4, 940, 107,
14, 672, 1790,
333, 47, 11,
7890,…, 1]
 
POS
NEG
LSTM Affine
 
    
LSTM LSTM LSTM
       
RecurrentSum
 
 
n=64
Nervana Systems Proprietary
Code!
LSTM
Nervana Systems Proprietary
More Code!
LSTM
Nervana Systems Proprietary
In Summary…
1. Deep learning methods are powerful and versatile
2. It’s important to understand how DL relates to
traditional ML methods
3. The barrier of entry to using DL in practice is
lowered with the neon framework on the Nervana
ecosystem
kyle.h.ambert@intel.com
@TheKyleAmbert

More Related Content

PDF
Introduction to Deep Learning with Will Constable
Intel Nervana
 
PDF
Introduction to deep learning @ Startup.ML by Andres Rodriguez
Intel Nervana
 
PPTX
Deep Learning for Robotics
Intel Nervana
 
PDF
Startup.Ml: Using neon for NLP and Localization Applications
Intel Nervana
 
PDF
ODSC West
Intel Nervana
 
PDF
Rethinking computation: A processor architecture for machine intelligence
Intel Nervana
 
PPTX
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana
 
PDF
Deep Learning at Scale
Intel Nervana
 
Introduction to Deep Learning with Will Constable
Intel Nervana
 
Introduction to deep learning @ Startup.ML by Andres Rodriguez
Intel Nervana
 
Deep Learning for Robotics
Intel Nervana
 
Startup.Ml: Using neon for NLP and Localization Applications
Intel Nervana
 
ODSC West
Intel Nervana
 
Rethinking computation: A processor architecture for machine intelligence
Intel Nervana
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana
 
Deep Learning at Scale
Intel Nervana
 

What's hot (19)

PDF
Nervana and the Future of Computing
Intel Nervana
 
PDF
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana
 
PDF
Urs Köster - Convolutional and Recurrent Neural Networks
Intel Nervana
 
PPTX
Squeezing Deep Learning Into Mobile Phones
Anirudh Koul
 
PDF
Improving Hardware Efficiency for DNN Applications
Chester Chen
 
PDF
Urs Köster Presenting at RE-Work DL Summit in Boston
Intel Nervana
 
PPTX
Android and Deep Learning
Oswald Campesato
 
PDF
Language translation with Deep Learning (RNN) with TensorFlow
S N
 
PDF
Large Scale Deep Learning with TensorFlow
Jen Aman
 
PDF
Recent developments in Deep Learning
Brahim HAMADICHAREF
 
PPTX
Deep learning on mobile
Anirudh Koul
 
PPTX
AI powered emotion recognition: From Inception to Production - Global AI Conf...
Apache MXNet
 
PPTX
Deep Learning Made Easy with Deep Features
Turi, Inc.
 
PDF
Deep Learning as a Cat/Dog Detector
Roelof Pieters
 
PPTX
Deep learning on mobile - 2019 Practitioner's Guide
Anirudh Koul
 
PPTX
Amazon Deep Learning
Amanda Mackay (she/her)
 
PDF
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA Taiwan
 
PPTX
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Data Con LA
 
PDF
Deep Learning Primer: A First-Principles Approach
Maurizio Calo Caligaris
 
Nervana and the Future of Computing
Intel Nervana
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana
 
Urs Köster - Convolutional and Recurrent Neural Networks
Intel Nervana
 
Squeezing Deep Learning Into Mobile Phones
Anirudh Koul
 
Improving Hardware Efficiency for DNN Applications
Chester Chen
 
Urs Köster Presenting at RE-Work DL Summit in Boston
Intel Nervana
 
Android and Deep Learning
Oswald Campesato
 
Language translation with Deep Learning (RNN) with TensorFlow
S N
 
Large Scale Deep Learning with TensorFlow
Jen Aman
 
Recent developments in Deep Learning
Brahim HAMADICHAREF
 
Deep learning on mobile
Anirudh Koul
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
Apache MXNet
 
Deep Learning Made Easy with Deep Features
Turi, Inc.
 
Deep Learning as a Cat/Dog Detector
Roelof Pieters
 
Deep learning on mobile - 2019 Practitioner's Guide
Anirudh Koul
 
Amazon Deep Learning
Amanda Mackay (she/her)
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA Taiwan
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Data Con LA
 
Deep Learning Primer: A First-Principles Approach
Maurizio Calo Caligaris
 
Ad

Viewers also liked (9)

PPTX
Tutorial on Opinion Mining and Sentiment Analysis
Yun Hao
 
PDF
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
Cataldo Musto
 
PPTX
Rule based approach to sentiment analysis at romip’11 slides
Dmitry Kan
 
PPTX
Sentiment analysis using naive bayes classifier
Dev Sahu
 
PDF
CS571: Sentiment Analysis
Jinho Choi
 
PDF
CS571: Gradient Descent
Jinho Choi
 
PPT
Text categorization
Phuong Nguyen
 
PDF
(Deep) Neural Networks在 NLP 和 Text Mining 总结
君 廖
 
PPTX
Text categorization
KU Leuven
 
Tutorial on Opinion Mining and Sentiment Analysis
Yun Hao
 
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
Cataldo Musto
 
Rule based approach to sentiment analysis at romip’11 slides
Dmitry Kan
 
Sentiment analysis using naive bayes classifier
Dev Sahu
 
CS571: Sentiment Analysis
Jinho Choi
 
CS571: Gradient Descent
Jinho Choi
 
Text categorization
Phuong Nguyen
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
君 廖
 
Text categorization
KU Leuven
 
Ad

Similar to Introduction to Deep Learning and neon at Galvanize (20)

PPTX
Machine Learning and Real-World Applications
MachinePulse
 
PPTX
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Vijay Srinivas Agneeswaran, Ph.D
 
PDF
AI and Deep Learning
Subrat Panda, PhD
 
PPTX
AI powered emotion recognition: From Inception to Production - Global AI Conf...
Vandana Kannan
 
PDF
Apache MXNet ODSC West 2018
Apache MXNet
 
PDF
[2A4]DeepLearningAtNAVER
NAVER D2
 
PPTX
Internship - Python - AI ML.pptx
Hchethankumar
 
PPTX
Internship - Python - AI ML.pptx
Hchethankumar
 
PPTX
Deep learning from a novice perspective
Anirban Santara
 
PPTX
Designing Artificial Intelligence
David Chou
 
PDF
Machine Duping 101: Pwning Deep Learning Systems
Clarence Chio
 
PDF
Synthetic dialogue generation with Deep Learning
S N
 
PDF
Machine Learning on the Cloud with Apache MXNet
delagoya
 
PDF
Image Classification Done Simply using Keras and TensorFlow
Rajiv Shah
 
PPTX
Final training course
Noor Dhiya
 
PPTX
Big Sky Earth 2018 Introduction to machine learning
Julien TREGUER
 
PPTX
Deep Learning and Watson Studio
Sasha Lazarevic
 
PPTX
Deep Learning with Apache MXNet (September 2017)
Julien SIMON
 
PDF
DEF CON 24 - Clarence Chio - machine duping 101
Felipe Prado
 
PPTX
Image classification using CNN
Noura Hussein
 
Machine Learning and Real-World Applications
MachinePulse
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Vijay Srinivas Agneeswaran, Ph.D
 
AI and Deep Learning
Subrat Panda, PhD
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
Vandana Kannan
 
Apache MXNet ODSC West 2018
Apache MXNet
 
[2A4]DeepLearningAtNAVER
NAVER D2
 
Internship - Python - AI ML.pptx
Hchethankumar
 
Internship - Python - AI ML.pptx
Hchethankumar
 
Deep learning from a novice perspective
Anirban Santara
 
Designing Artificial Intelligence
David Chou
 
Machine Duping 101: Pwning Deep Learning Systems
Clarence Chio
 
Synthetic dialogue generation with Deep Learning
S N
 
Machine Learning on the Cloud with Apache MXNet
delagoya
 
Image Classification Done Simply using Keras and TensorFlow
Rajiv Shah
 
Final training course
Noor Dhiya
 
Big Sky Earth 2018 Introduction to machine learning
Julien TREGUER
 
Deep Learning and Watson Studio
Sasha Lazarevic
 
Deep Learning with Apache MXNet (September 2017)
Julien SIMON
 
DEF CON 24 - Clarence Chio - machine duping 101
Felipe Prado
 
Image classification using CNN
Noura Hussein
 

More from Intel Nervana (9)

PDF
Women in AI kickoff
Intel Nervana
 
PDF
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Intel Nervana
 
PDF
RE-Work Deep Learning Summit - September 2016
Intel Nervana
 
PDF
Using neon for pattern recognition in audio data
Intel Nervana
 
PDF
An Analysis of Convolution for Inference
Intel Nervana
 
PDF
High-Performance GPU Programming for Deep Learning
Intel Nervana
 
PDF
Object Detection and Recognition
Intel Nervana
 
PDF
Video Activity Recognition and NLP Q&A Model Example
Intel Nervana
 
PDF
Anil Thomas - Object recognition
Intel Nervana
 
Women in AI kickoff
Intel Nervana
 
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Intel Nervana
 
RE-Work Deep Learning Summit - September 2016
Intel Nervana
 
Using neon for pattern recognition in audio data
Intel Nervana
 
An Analysis of Convolution for Inference
Intel Nervana
 
High-Performance GPU Programming for Deep Learning
Intel Nervana
 
Object Detection and Recognition
Intel Nervana
 
Video Activity Recognition and NLP Q&A Model Example
Intel Nervana
 
Anil Thomas - Object recognition
Intel Nervana
 

Recently uploaded (20)

PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Software Development Methodologies in 2025
KodekX
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 

Introduction to Deep Learning and neon at Galvanize

  • 1. Proprietary and confidential. Do not distribute. Introduction to Deep Learning and Neon MAKING MACHINES SMARTER.™ Kyle H. Ambert, PhD
 Senior Data Scientist May 25 , 2017th @TheKyleAmbert
  • 2. Nervana Systems Proprietary About me & Intel’s Artificial Intelligence Products Group (AIPG) +
  • 3. Nervana Systems Proprietary About me & Intel’s Artificial Intelligence Products Group (AIPG) +
  • 4. Nervana Systems Proprietary About me & Intel’s Artificial Intelligence Products Group (AIPG) +
  • 5. Nervana Systems Proprietary About me & Intel’s Artificial Intelligence Products Group (AIPG) +
  • 6. Nervana Systems Proprietary About me & Intel’s Artificial Intelligence Products Group (AIPG) +
  • 7. Nervana Systems Proprietary About me & Intel’s Artificial Intelligence Products Group (AIPG) + Together, we create production deep learning solutions in multiple domains, while advancing the field of applied analytics and optimization.
  • 8. Nervana Systems Proprietary 8 Intel’s Interest in Analytics To provide the infrastructure for the fastest time-to-insight To create tools that enable scientists to think about their research, rather than their process To enable users to ask bigger questions Bigger Data Better Hardware Smarter Algorithms Image: 1000 KB / picture Audio: 5000 KB / song Video: 5,000,000 KB / movie Transistor density doubles every 18 months Cost / GB in 1995: $1000.00 Cost / GB in 2015: $0.03 Advances in neural networks leading to better accuracy in training models Great solutions require great hardware!
  • 9. Nervana Systems Proprietary LIBRARIES Intel® MKL Intel® MKL-DNN FRAMEWORKS Intel® DAAL HARDWARE Memory/Storage FabricCompute Intel Distribution MORE UNLEASHING POTENTIAL FULL SOLUTIONS PLATFORMS/TOOLS BIGDL Intel® Nervana™ Deep Learning Platform Intel® Nervana™ Cloud Intel® Nervana™ Graph
  • 10. Nervana Systems Proprietary 10 This Evening 1. Machine Learning and Data Science 2. Introduction to Deep Learning 3. Nervana! 4. Neon 5. Deep Learning Use Cases
  • 11. Nervana Systems Proprietary 11 This Evening 1. Machine Learning and Data Science 2. Introduction to Deep Learning 3. Nervana! 4. Neon 5. Deep Learning Use Cases
  • 12. Nervana Systems Proprietary 12 AI? Machine Learning? Deep Learning?
  • 13. Machine learning is the development, and application of, algorithms that can learn from data in an automated, semi-automated, or supervised setting. Deep LearningStatistical Learning Algorithms where multiple layers of neurons learn successively complex representations of input data CNN RNN DFF RBM LSTM Algorithms which leverage statistical methods for estimating functions from examples Naïve Bayes SVM GLM Tree- based kNN Training: building a mathematical model based on input data Classification (scoring): using a trained model to make predictions about new data
  • 14. Machine learning is the development, and application of, algorithms that can learn from data in an automated, semi-automated, or supervised setting. Deep LearningStatistical Learning Algorithms where multiple layers of neurons learn successively complex representations of input data CNN RNN DFF RBM LSTM Algorithms which leverage statistical methods for estimating functions from examples Naïve Bayes SVM GLM Tree- based kNN Training: building a mathematical model based on input data Classification (scoring): using a trained model to make predictions about new data
  • 15. Machine learning is the development, and application of, algorithms that can learn from data in an automated, semi-automated, or supervised setting. Deep LearningStatistical Learning Algorithms where multiple layers of neurons learn successively complex representations of input data CNN RNN DFF RBM LSTM Algorithms which leverage statistical methods for estimating functions from examples Naïve Bayes SVM GLM Tree- based kNN Training: building a mathematical model based on input data Classification (scoring): using a trained model to make predictions about new data Ingest Data Engineer
Features Structure
 Model Clean Data Visualize Query/ Analyze TrainM odel Deploy
  • 16. Nervana Systems Proprietary 16 This Evening 1. Machine Learning and Data Science 2. Introduction to Deep Learning 3. Nervana! 4. Neon 5. Deep Learning Use Cases
  • 17. Nervana Systems Proprietary 17 A Quite Brief History of Deep Learning • 1960s: Neural networks used for binary classification • 1970s: Neural networks popularity dries after not delivering on the hype • 1980s: Backpropagation is used to train deep networks • 1990s: Neural networks take the back seat to support vector machines due to the nice theoretical properties and guarantee bounds • 2010s: Access to large datasets and more computation allowed deep networks to return and have state-of-the-art results in speech, vision, and natural language processing • 1949: The Organization of Behavior is published (Hebb!) (Minsky) Today: Deep Learning is a fast-moving area of academic and applied analytics! There are many opportunities for new discoveries! (Vapnik) (Hinton)
  • 18. Nervana Systems Proprietary 18 ML v. DL: Practical Differences   SVM Random Forest Naïve Bayes Decision Trees Logistic Regression Ensemble methods     Harrison
  • 19. Nervana Systems Proprietary 19 End-to-End Deep learning ~60 million parameters Harrison  
  • 20. Nervana Systems Proprietary 20 Workflows in Machine Learning ⟹ The same rules apply for deep learning! ➝ Preprocessing data ➝ Feature extraction ➝ Parsimony in model selection ⟹ How we go about some of this does change…
  • 21. Nervana Systems Proprietary 21 End-to-End Deep learning: Data Considerations
  • 22. Nervana Systems Proprietary 22 End-to-End Deep learning: Data Considerations
  • 23. Nervana Systems Proprietary 23 End-to-End Deep learning: Data Considerations X X X XX X Labels: Harrison? Transformations! More data is always better!
  • 24. Nervana Systems Proprietary Deep Learning: Networks of Artificial Neurons       Output of unit Activation Function Linear weights Bias unit Input from unit j                  ⟹ With an explosion of moving parts, being able to understand and keep track of what sort of model is being built becomes even more important!
  • 25. Nervana Systems Proprietary Practical example: recognition of handwritten digits MNIST dataset 70,000 images (28x28 pixels) Goal: classify images into a digit 0-9 N = 28 x 28 pixels = 784 input units N = 10 output units (one for each digit) Each unit i encodes the probability of the input image of being of the digit i N = 100 hidden units (user-defined parameter) Input Hidden Output
  • 26. Nervana Systems Proprietary Training procedure Input Hidden Output 1. Randomly seed weights 2. Forward-pass 3. Cost 4. Backward-pass 5. Update weights
  • 27. Nervana Systems Proprietary Forward pass 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output (10x1) Input Hidden Output 28x28
  • 28. Nervana Systems Proprietary Cost 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output (10x1) 28x28 Input Hidden Output 0 0 0 1 0 0 0 0 0 0 Ground Truth Cost function  
  • 29. Nervana Systems Proprietary Backward pass 0.0 0.1 0.0 0.3 0.1 0.1 0.0 0.0 0.4 0.0 Output (10x1) Input Hidden Output 0 0 0 1 0 0 0 0 0 0 Ground Truth Cost function    ∆Wi→j
  • 34. Nervana Systems Proprietary Training fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop  
  • 35. Nervana Systems Proprietary Gradient descent fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   Update weights via:   Learning rate
  • 36. Nervana Systems Proprietary Stochastic (minibatch) Gradient descent fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   fprop cost bprop   minibatch #1 weight update minibatch #2 weight update
  • 37. Nervana Systems Proprietary Stochastic (minibatch) Gradient descent Epoch 0 Epoch 1 Sample numbers: • Learning rate ~0.001 • Batch sizes of 32-128 • 50-90 epochs
  • 38. Nervana Systems Proprietary Why Does This Work at All? Krizhevsky, 2012 60 million parameters 120 million parameters Taigman, 2014
  • 39. Nervana Systems Proprietary 39 This Evening 1. Machine Learning and Data Science 2. Introduction to Deep Learning 3. Nervana! 4. Neon 5. Deep Learning Use Cases
  • 40. Nervana Systems Proprietary Nervana in 30 seconds. Possibly less. 40 neon deep learning framework train deployexplore nervana engine 2-3x speedup on Titan X GPUs cloudn
  • 42. Nervana Systems Proprietary nervana cloud Web Interface Command Line
  • 43. Nervana Systems Proprietary 43 This Evening 1. Machine Learning and Data Science 2. Introduction to Deep Learning 3. Nervana! 4. Neon 5. Deep Learning Use Cases
  • 44. Nervana Systems Proprietary Ge(i)t Neon! 1. git clone https://github.com/NervanaSystems/neon.git 2. pip install {h5py, pyaml, virtualenv} 3. brew install {opencv|opencv3} 4. make {python2|python3} 5. . .venv/bin/activate 6. examples/mnist_mlp.py 7. deactivate ⟹ https://goo.gl/jZgfNg Documentation!
  • 45. Nervana Systems Proprietary Deep learning ingredients Dataset Model/Layers Activation OptimizerCost  
  • 46. Nervana Systems Proprietary neon overview Backend NervanaGPU, NervanaCPU, NervanaMGPU Datasets MNIST, CIFAR-10, Imagenet 1K, PASCAL VOC, Mini-Places2, IMDB, Penn Treebank, Shakespeare Text, bAbI, Hutter-prize, UCF101, flickr8k, flickr30k, COCO Initializers Constant, Uniform, Gaussian, Glorot Uniform, Xavier, Kaiming, IdentityInit, Orthonormal Optimizers Gradient Descent with Momentum, RMSProp, AdaDelta, Adam, Adagrad,MultiOptimizer Activations Rectified Linear, Softmax, Tanh, Logistic, Identity, ExpLin Layers Linear, Convolution, Pooling, Deconvolution, Dropout, Recurrent,Long Short- Term Memory, Gated Recurrent Unit, BatchNorm, LookupTable,Local Response Normalizat ion, Bidirectional-RNN, Bidirectional-LSTM Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error Metrics Misclassification (Top1, TopK), LogLoss, Accuracy, PrecisionRecall, ObjectDetection
  • 47. Nervana Systems Proprietary Curated Models 47 • https://github.com/NervanaSystems/ModelZoo • Pre-trained weights and models SegNet Deep Speech 2 Skip-thought Autoencoders Deep Dream
  • 48. Nervana Systems Proprietary Neon workflow 1. Generate backend 2. Load data 3. Specify model architecture 4. Define training parameters 5. Train model 6. Evaluate
  • 49. Nervana Systems Proprietary Interacting with Neon 1. Via command line 2. In a virtual environment 3. In an ipython/jupyter notebook 4. ncloud
  • 53. Nervana Systems Proprietary 53 This Evening 1. Machine Learning and Data Science 2. Introduction to Deep Learning 3. Nervana! 4. Neon 5. Deep Learning Use Cases
  • 56. Nervana Systems Proprietary •Layers: convolution, rectified linear units, pooling, dropout, softmax •Popular with 2D + depth (+ time) inputs •Gray or RBG images •Videos •Synthetic aperture radar •Spectrogram (speech)
  • 57. Nervana Systems Proprietary •Layers: convolution, rectified linear units, pooling, dropout, softmax •Use multiple copies of the same feature on the input (correlation) •Use several features (aka kernels, filters) •Reduces number of weights compared to fully connected
  • 58. Nervana Systems Proprietary •Layers: convolution, rectified linear units (ReLu), pooling, dropout, softmax •It is fast – no normalization or exponential computations •Induces sparsity in the hidden units  
  • 59. Nervana Systems Proprietary •Layers: convolution, rectified linear units, pooling, dropout, softmax •Downsampling •Reduces the number of parameters •Provides some translation invariance
  • 60. Nervana Systems Proprietary •Layers: convolution, rectified linear units, pooling, dropout, softmax •Reduces overfitting – Prevents co-adaptation on training data
  • 61. Nervana Systems Proprietary •Layers: convolution, rectified linear units, pooling, dropout, softmax •aka “normalized exponential function” •Normalizes vector to a probability distribution 
  • 63. Nervana Systems Proprietary 63 DEEP LEARNING USE CASES! Long Short-Term Memory (LSTM)
  • 64. Nervana Systems Proprietary Why Recurrent Neural Networks? Input Hidden Output • Temporal dependencies • Variable sequence length • Independence • Fixed Length
  • 65. Nervana Systems Proprietary Recurrent neuron                
  • 66. Nervana Systems Proprietary RNN: what is it good for? 0.1 -0.4 0.6 1 0 0 0 0.1 0.7 0.1 0.1 -0.3 0.6 1.6 1 0 0 0 0.1 0.3 0.4 0.2 0.7 -0.4 -0.4 1 0 0 0 0.3 0.0 0.6 0.1 0.1 -0.8 0.1 1 0 0 0 0.0 0.0 0.2 0.8 “h” “e” “l” “l” “e” “l” “l” “o”   Learned a language model!
  • 67. Nervana Systems Proprietary RNN: what is it good for? 0.1 -0.4 0.6 1 0 0 0 0.1 0.7 0.1 0.1 -0.3 0.6 1.6 1 0 0 0 0.1 0.3 0.4 0.2 0.7 -0.4 -0.4 1 0 0 0 0.4 0.0 0.5 0.1 0.1 -0.8 0.1 1 0 0 0 0.0 0.0 0.2 0.8 “cash” “flow” “is” “high” “flow” “is” “high” “today”   Learned a language model! “low” “high”
  • 68. Nervana Systems Proprietary RNN: what is it good for? 0.1 -0.4 0.6 1 0 0 0 -0.3 0.6 1.6 0 1 0 0 0.7 -0.4 -0.4 0 0 1 0 0.1 -0.8 0.1 0 0 0 1 “this” “movie” “was” “bad” NEGATIVE “and” “long” <eos> 0.1 -0.8 0.1 1 0 0 0 0.7 -0.4 -0.4 1 0 0 0 -0.3 0.6 1.6 0 1 0 0 0.2 0.8
  • 69. Nervana Systems Proprietary RNN: what is it good for? 0.1 -0.4 0.6 1 0 0 0 -0.3 0.6 1.6 0 1 0 0 0.7 -0.4 -0.4 0 0 1 0 0.1 -0.8 0.1 “neon” “is” “amazing” 0.1 -0.8 0.1 0.7 -0.4 -0.4 -0.3 0.6 1.6 0.1 0.7 0.1 0.1 0.1 0.3 0.4 0.2 0.3 0.0 0.6 0.1 0.0 0.0 0.2 0.8 “neon” “est” “incroyable” “!” 0.1 -0.4 0.6 1 0 0 0 -0.3 0.6 1.6 0 1 0 0 0.7 -0.4 -0.4 0 0 1 0 0.1 -0.8 0.1 “neon” “is” “amazing” 0.1 -0.8 0.1 0.7 -0.4 -0.4 -0.3 0.6 1.6 0.1 0.7 0.1 0.1 0.1 0.3 0.4 0.2 0.3 0.0 0.6 0.1 0.0 0.0 0.2 0.8 “neon”“est”“incroyable”“!”
  • 70. Nervana Systems Proprietary Long-Short Term Memory (LSTM)           1 1   1 Manipulate memory cell: 1. “forget” (flush the memory) 2. “input” (add to memory) 3. “output” (get from memory)
  • 71. Nervana Systems Proprietary Example – Sentiment analysis with LSTM “Okay, sorry, but I loved this movie. I just love the whole 80’s genre of these kind of movies, because you don’t see many like this...” -~CupidGrl~ POSITIVE The plot/writing is completely unrealistic and just dumb at times. Bond is dressed up in a white tux on an overnight train ride? eh, OK. But then they just show up at the villain’s compound like nothing bad is going to happen to them. How stupid is this Bond? NEGATIVE
  • 72. Nervana Systems Proprietary Preprocessing “Okay, sorry, but I loved this movie. I just love the whole 80’s genre of these kind of movies, because you don’t see many like this...” -~CupidGrl~ [5, 4, 940, 107, 14, 672, 1790, 333, 47, 11, 7890, …,1] Out-of-Vocab (e.g. CupidGrl) • Limit vocab size to 20,000 words • Truncate each example to 128 words [from the left] • Pad examples up to 128 whitespace
  • 73. Nervana Systems Proprietary Model d=128 embedding layer LSTM LSTM LSTM LSTM N=2 [5, 4, 940, 107, 14, 672, 1790, 333, 47, 11, 7890, …,1]   POS NEG N=64 LSTM AffineRecurrentSum  
  • 74. Nervana Systems Proprietary Data flow d=128 embedding layer LSTM (2, 1) POS NEG LSTM Affine      LSTM LSTM LSTM         RecurrentSum     n=64
  • 75. Nervana Systems Proprietary Data flow in batches with neon d=128 embedding layer LSTM (2, bsz) [5, 4, 940, 107, 14, 672, 1790, 333, 47, 11, 7890,…, 1]   POS NEG LSTM Affine        LSTM LSTM LSTM         RecurrentSum     n=64
  • 78. Nervana Systems Proprietary In Summary… 1. Deep learning methods are powerful and versatile 2. It’s important to understand how DL relates to traditional ML methods 3. The barrier of entry to using DL in practice is lowered with the neon framework on the Nervana ecosystem [email protected] @TheKyleAmbert