SlideShare a Scribd company logo
Anil Thomas
Recurrent Neural Hacks Meetup
July 16, 2016
MAKING MACHINES SMARTER.™
Using neon for pattern recognition
in audio data
Outline
2
•  Intro to neon
•  Workshop environment setup
•  CNN theory
•  CNN hands-on
•  RNN theory
•  RNN hands-on
NEON
3
Neon
4
Backends
NervanaCPU, NervanaGPU
NervanaMGPU, NervanaEngine (internal)
Datasets
Images: ImageNet, CIFAR-10, MNIST
Captions: flickr8k, flickr30k, COCO; Text: Penn Treebank, hutter-prize, IMDB, Amazon
Initializers Constant, Uniform, Gaussian, Glorot Uniform
Learning rules
Gradient Descent with Momentum
RMSProp, AdaDelta, Adam, Adagrad
Activations Rectified Linear, Softmax, Tanh, Logistic
Layers
Linear, Convolution, Pooling, Deconvolution, Dropout
Recurrent, Long Short-Term Memory, Gated Recurrent Unit, Recurrent Sum,
LookupTable
Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error
Metrics Misclassification, TopKMisclassification, Accuracy
•  Modular components
•  Extensible, OO design
•  Documentation
•  neon.nervanasys.com
Why neon?
5
•  Fastest according to third-party benchmarks
•  Advanced data-loading capabilities
•  Open source (including the optimized GPU kernels!)
•  Support for distributed computing
6
Benchmarks for RNNs1
Data loading in neon
7
•  Multithreaded
•  Non-blocking IO
•  Non-blocking decompression and augmentation
•  Supports different types of data
(We will focus on audio in this workshop)
•  Automatic ingest
•  Can handle huge datasets
WORKSHOP ENV SETUP
8
Github repo with source code
9
•  https://github.com/anlthms/meetup2
•  Music classification examples
•  To clone locally:
git clone https://github.com/anlthms/meetup2.git
Configuring EC2 instance
10
•  Use your own EC2 account at http://aws.amazon.com/ec2/
•  Select US West (N. California) zone
•  Search for nervana-neon10 within Community AMIs
•  Select g2.2xlarge as instance type
•  After launching the instance, log in and activate virtual env:
source ~/neon/.venv/bin/activate
Datasets
11
•  GTZAN music genre dataset
10 genres
100 clips in each genre
Each clip is 30 seconds long
Preloaded in the AMI at /home/ubuntu/nervana/music/
•  Whale calls dataset from Kaggle
https://www.kaggle.com/c/whale-detection-challenge/data
30,000 2-second sound clips
Preloaded in the AMI at /home/ubuntu/nervana/wdc/
CNN THEORY
12
Convolution
0 1 2
3 4 5
6 7 8
0 1
2 3
19 25
37 43
0 1 3 4 0 1 2 3 19
13
•  Each element in the output is the result of a dot
product between two vectors
Convolutional layer
14
0
1
2
3
4
5
6
7
8
19
14
0 1 2
3 4 5
6 7 8
0 1
2 3
19 25
37 43
0
2
3
1
0
2
3
1
0
2
3
1
0
2
3
1
25
37
43
Convolutional layer
15
x +
x +
x +
x =
00
11
32
43
The weights are shared among
the units.
0
1
2
3
4
5
6
7
8
0
2
3
1
0
2
3
1
0
2
3
1
0
2
3
1
19
19
Recognizing patterns
16
Detected the pattern!
17
B0 B1 B2
B3 B4 B5
B6 B7 B8
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8
18
B0 B1 B2
B3 B4 B5
B6 B7 B8
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8
B0 B1 B2
B3 B4 B5
B6 B7 B8
19
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8
B0 B1 B2
B3 B4 B5
B6 B7 B8
20
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8
Max pooling
0 1 2
3 4 5
6 7 8
4 5
7 8
0 1 3 4 4
21
•  Each element in the output is the maximum value
within the pooling window
Max( )
CNN HANDS-ON
22
CNN example
23
•  Source at https://github.com/anlthms/meetup2/blob/master/cnn1.py
•  Command line:
./cnn1.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v
Sample spectrograms
24
Sample output of conv layer #1
25
Sample output of conv layer #2
26
RNN THEORY
27
Recurrent layer
28
x
h
WI
WR
ht = g ( WI xt + WR h(t-1) )
Recurrent layer (unrolled)
29
x1 x2 x3
h1 h2 h3
WI WI WI
WR WR WR
ht = g ( WI xt + WR h(t-1) )
•  x represents the input
sequence
•  h is the memory (as well as
the output of the recurrent
layer)
•  h is computed from the
past memory and the
current input
•  h summarizes the sequence
up to the current time steptime
Training RNNs
30
•  Exploding/vanishing gradients
•  Initialization
•  Gradient clipping
•  Optimizers with Adaptive learning rates (e.g. Adagrad)
•  LSTM to capture long term dependencies
31
x!
tanh!
x! +!
x!
output gate!
forget gate! Input gate!
input!
LSTM
Applications of RNNs
32
•  Image captioning
•  Speech recognition
•  Machine translation
•  Time-series analysis
RNN HANDS-ON
33
RNN example 1
34
•  Source at https://github.com/anlthms/meetup2/blob/master/rnn1.py
•  Command line:
./rnn1.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v
(Does not work well! This example demonstrates challenges of training RNNs)
RNN example 2
35
•  Source at https://github.com/anlthms/meetup2/blob/master/rnn2.py
•  Command line:
./rnn2.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v
(Uses Glorot init, gradient clipping, Adagrad and LSTM)
RNN example 3
36
•  Source at https://github.com/anlthms/meetup2/blob/master/rnn3.py
•  Command line:
./rnn3.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v
(Uses multiple bi-rnn layers)
Final example
37
•  Source at https://github.com/NervanaSystems/neon/blob/master/examples/whale_calls.py
•  Command line:
./whale_calls.py -e 16 -w /home/ubuntu/nervana/wdc -r 0 -s whales.pkl -v
Network structure of final example
38
Convolution Layer 'Convolution_0': 1 x (81x49) inputs, 128 x (79x24) outputs, 0,0 padding, 1,2 stride
Activation Layer 'Convolution_0_Rectlin': Rectlin
Convolution Layer 'Convolution_1': 128 x (79x24) inputs, 256 x (77x22) outputs, 0,0 padding, 1,1 stride
BatchNorm Layer 'Convolution_1_bnorm': 433664 inputs, 1 steps, 256 feature maps
Activation Layer 'Convolution_1_Rectlin': Rectlin
Pooling Layer 'Pooling_0': 256 x (77x22) inputs, 256 x (38x11) outputs
Convolution Layer 'Convolution_2': 256 x (38x11) inputs, 512 x (37x10) outputs, 0,0 padding, 1,1 stride
BatchNorm Layer 'Convolution_2_bnorm': 189440 inputs, 1 steps, 512 feature maps
Activation Layer 'Convolution_2_Rectlin': Rectlin
BiRNN Layer 'BiRNN_0': 18944 inputs, (256 outputs) * 2, 10 steps
BiRNN Layer 'BiRNN_1': (256 inputs) * 2, (256 outputs) * 2, 10 steps
BiRNN Layer 'BiRNN_2': (256 inputs) * 2, (256 outputs) * 2, 10 steps
RecurrentOutput choice RecurrentLast : (512, 10) inputs, 512 outputs
Linear Layer 'Linear_0': 512 inputs, 32 outputs
BatchNorm Layer 'Linear_0_bnorm': 32 inputs, 1 steps, 32 feature maps
Activation Layer 'Linear_0_Rectlin': Rectlin
Linear Layer 'Linear_1': 32 inputs, 2 outputs
Activation Layer 'Linear_1_Softmax': Softmax
More info
39
Nervana’s deep learning tutorials:
https://www.nervanasys.com/deep-learning-tutorials/
Github page:
https://github.com/NervanaSystems/neon
For more information, contact:
info@nervanasys.com
40
NERVANA

More Related Content

What's hot (20)

PDF
Urs Köster Presenting at RE-Work DL Summit in Boston
Intel Nervana
 
PDF
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA Taiwan
 
PDF
Introduction to Deep Learning and neon at Galvanize
Intel Nervana
 
PDF
ODSC West
Intel Nervana
 
PDF
Optimizing the graphics pipeline with compute
WuBinbo
 
PDF
The rendering technology of 'lords of the fallen' philip hammer
Mary Chan
 
PDF
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA Taiwan
 
PDF
Evolution of Supermicro GPU Server Solution
NVIDIA Taiwan
 
PDF
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana
 
PPTX
Scope Stack Allocation
Electronic Arts / DICE
 
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
Graham Wihlidal
 
PDF
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
AMD Developer Central
 
PDF
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA Taiwan
 
PDF
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
Edge AI and Vision Alliance
 
PPTX
Borderless Per Face Texture Mapping
basisspace
 
PDF
Introduction to Deep Learning (NVIDIA)
Rakuten Group, Inc.
 
PDF
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
NVIDIA Taiwan
 
PDF
Moving Toward Deep Learning Algorithms on HPCC Systems
HPCC Systems
 
PDF
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Spark Summit
 
PDF
An Analysis of Convolution for Inference
Intel Nervana
 
Urs Köster Presenting at RE-Work DL Summit in Boston
Intel Nervana
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA Taiwan
 
Introduction to Deep Learning and neon at Galvanize
Intel Nervana
 
ODSC West
Intel Nervana
 
Optimizing the graphics pipeline with compute
WuBinbo
 
The rendering technology of 'lords of the fallen' philip hammer
Mary Chan
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA Taiwan
 
Evolution of Supermicro GPU Server Solution
NVIDIA Taiwan
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana
 
Scope Stack Allocation
Electronic Arts / DICE
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Graham Wihlidal
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
AMD Developer Central
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA Taiwan
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
Edge AI and Vision Alliance
 
Borderless Per Face Texture Mapping
basisspace
 
Introduction to Deep Learning (NVIDIA)
Rakuten Group, Inc.
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
NVIDIA Taiwan
 
Moving Toward Deep Learning Algorithms on HPCC Systems
HPCC Systems
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Spark Summit
 
An Analysis of Convolution for Inference
Intel Nervana
 

Viewers also liked (15)

ODP
Recurrent Neural Network tutorial (2nd)
신동 강
 
PDF
Video Activity Recognition and NLP Q&A Model Example
Intel Nervana
 
PDF
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Intel Nervana
 
PDF
Nervana AI Overview Deck April 2016
Sean Everett
 
PDF
Deep Learning at Scale
Intel Nervana
 
PDF
Introduction to Deep Learning with Will Constable
Intel Nervana
 
PDF
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
 
PDF
High-Performance GPU Programming for Deep Learning
Intel Nervana
 
PPTX
Deep Learning for Robotics
Intel Nervana
 
PDF
RE-Work Deep Learning Summit - September 2016
Intel Nervana
 
PDF
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Object Detection and Recognition
Intel Nervana
 
PDF
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
Taehoon Kim
 
PDF
Machine Translation Introduction
nlab_utokyo
 
PDF
Big Data visualization with Apache Spark and Zeppelin
prajods
 
Recurrent Neural Network tutorial (2nd)
신동 강
 
Video Activity Recognition and NLP Q&A Model Example
Intel Nervana
 
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Intel Nervana
 
Nervana AI Overview Deck April 2016
Sean Everett
 
Deep Learning at Scale
Intel Nervana
 
Introduction to Deep Learning with Will Constable
Intel Nervana
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
 
High-Performance GPU Programming for Deep Learning
Intel Nervana
 
Deep Learning for Robotics
Intel Nervana
 
RE-Work Deep Learning Summit - September 2016
Intel Nervana
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Universitat Politècnica de Catalunya
 
Object Detection and Recognition
Intel Nervana
 
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
Taehoon Kim
 
Machine Translation Introduction
nlab_utokyo
 
Big Data visualization with Apache Spark and Zeppelin
prajods
 
Ad

Similar to Using neon for pattern recognition in audio data (20)

PDF
Pr045 deep lab_semantic_segmentation
Taeoh Kim
 
PDF
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
Edge AI and Vision Alliance
 
PDF
Recent Object Detection Research & Person Detection
Kai-Wen Zhao
 
PDF
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
Thom Lane
 
PDF
Deep Learning in theano
Massimo Quadrana
 
PDF
Wuala, P2P Online Storage
adunne
 
PDF
Anatomy of neutron from the eagle eyes of troubelshoorters
Sadique Puthen
 
PPTX
Deep learning requirement and notes for novoice
AmmarAhmedSiddiqui2
 
PDF
Lecture 06 marco aurelio ranzato - deep learning
mustafa sarac
 
PDF
Killzone Shadow Fall Demo Postmortem
Guerrilla
 
PDF
Profiling PyTorch for Efficiency & Sustainability
geetachauhan
 
PPTX
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
PDF
Using R in remote computer clusters
Burak Himmetoglu
 
PDF
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
Takahiro Harada
 
PDF
rnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
khushbu maurya
 
PPTX
150807 Fast R-CNN
Junho Cho
 
PDF
running stable diffusion on android
Koan-Sin Tan
 
PDF
Performance and scalability for machine learning
Arnaud Rachez
 
PDF
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
Haim Yadid
 
PPTX
Scaling Python to CPUs and GPUs
Travis Oliphant
 
Pr045 deep lab_semantic_segmentation
Taeoh Kim
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
Edge AI and Vision Alliance
 
Recent Object Detection Research & Person Detection
Kai-Wen Zhao
 
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
Thom Lane
 
Deep Learning in theano
Massimo Quadrana
 
Wuala, P2P Online Storage
adunne
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Sadique Puthen
 
Deep learning requirement and notes for novoice
AmmarAhmedSiddiqui2
 
Lecture 06 marco aurelio ranzato - deep learning
mustafa sarac
 
Killzone Shadow Fall Demo Postmortem
Guerrilla
 
Profiling PyTorch for Efficiency & Sustainability
geetachauhan
 
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
Using R in remote computer clusters
Burak Himmetoglu
 
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
Takahiro Harada
 
rnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
khushbu maurya
 
150807 Fast R-CNN
Junho Cho
 
running stable diffusion on android
Koan-Sin Tan
 
Performance and scalability for machine learning
Arnaud Rachez
 
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
Haim Yadid
 
Scaling Python to CPUs and GPUs
Travis Oliphant
 
Ad

Recently uploaded (20)

PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 

Using neon for pattern recognition in audio data

  • 1. Anil Thomas Recurrent Neural Hacks Meetup July 16, 2016 MAKING MACHINES SMARTER.™ Using neon for pattern recognition in audio data
  • 2. Outline 2 •  Intro to neon •  Workshop environment setup •  CNN theory •  CNN hands-on •  RNN theory •  RNN hands-on
  • 4. Neon 4 Backends NervanaCPU, NervanaGPU NervanaMGPU, NervanaEngine (internal) Datasets Images: ImageNet, CIFAR-10, MNIST Captions: flickr8k, flickr30k, COCO; Text: Penn Treebank, hutter-prize, IMDB, Amazon Initializers Constant, Uniform, Gaussian, Glorot Uniform Learning rules Gradient Descent with Momentum RMSProp, AdaDelta, Adam, Adagrad Activations Rectified Linear, Softmax, Tanh, Logistic Layers Linear, Convolution, Pooling, Deconvolution, Dropout Recurrent, Long Short-Term Memory, Gated Recurrent Unit, Recurrent Sum, LookupTable Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error Metrics Misclassification, TopKMisclassification, Accuracy •  Modular components •  Extensible, OO design •  Documentation •  neon.nervanasys.com
  • 5. Why neon? 5 •  Fastest according to third-party benchmarks •  Advanced data-loading capabilities •  Open source (including the optimized GPU kernels!) •  Support for distributed computing
  • 7. Data loading in neon 7 •  Multithreaded •  Non-blocking IO •  Non-blocking decompression and augmentation •  Supports different types of data (We will focus on audio in this workshop) •  Automatic ingest •  Can handle huge datasets
  • 9. Github repo with source code 9 •  https://github.com/anlthms/meetup2 •  Music classification examples •  To clone locally: git clone https://github.com/anlthms/meetup2.git
  • 10. Configuring EC2 instance 10 •  Use your own EC2 account at http://aws.amazon.com/ec2/ •  Select US West (N. California) zone •  Search for nervana-neon10 within Community AMIs •  Select g2.2xlarge as instance type •  After launching the instance, log in and activate virtual env: source ~/neon/.venv/bin/activate
  • 11. Datasets 11 •  GTZAN music genre dataset 10 genres 100 clips in each genre Each clip is 30 seconds long Preloaded in the AMI at /home/ubuntu/nervana/music/ •  Whale calls dataset from Kaggle https://www.kaggle.com/c/whale-detection-challenge/data 30,000 2-second sound clips Preloaded in the AMI at /home/ubuntu/nervana/wdc/
  • 13. Convolution 0 1 2 3 4 5 6 7 8 0 1 2 3 19 25 37 43 0 1 3 4 0 1 2 3 19 13 •  Each element in the output is the result of a dot product between two vectors
  • 14. Convolutional layer 14 0 1 2 3 4 5 6 7 8 19 14 0 1 2 3 4 5 6 7 8 0 1 2 3 19 25 37 43 0 2 3 1 0 2 3 1 0 2 3 1 0 2 3 1 25 37 43
  • 15. Convolutional layer 15 x + x + x + x = 00 11 32 43 The weights are shared among the units. 0 1 2 3 4 5 6 7 8 0 2 3 1 0 2 3 1 0 2 3 1 0 2 3 1 19 19
  • 17. 17 B0 B1 B2 B3 B4 B5 B6 B7 B8 G0 G1 G2 G3 G4 G5 G6 G7 G8 R0 R1 R2 R3 R4 R5 R6 R7 R8
  • 18. 18 B0 B1 B2 B3 B4 B5 B6 B7 B8 G0 G1 G2 G3 G4 G5 G6 G7 G8 R0 R1 R2 R3 R4 R5 R6 R7 R8
  • 19. B0 B1 B2 B3 B4 B5 B6 B7 B8 19 G0 G1 G2 G3 G4 G5 G6 G7 G8 R0 R1 R2 R3 R4 R5 R6 R7 R8
  • 20. B0 B1 B2 B3 B4 B5 B6 B7 B8 20 G0 G1 G2 G3 G4 G5 G6 G7 G8 R0 R1 R2 R3 R4 R5 R6 R7 R8
  • 21. Max pooling 0 1 2 3 4 5 6 7 8 4 5 7 8 0 1 3 4 4 21 •  Each element in the output is the maximum value within the pooling window Max( )
  • 23. CNN example 23 •  Source at https://github.com/anlthms/meetup2/blob/master/cnn1.py •  Command line: ./cnn1.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v
  • 25. Sample output of conv layer #1 25
  • 26. Sample output of conv layer #2 26
  • 28. Recurrent layer 28 x h WI WR ht = g ( WI xt + WR h(t-1) )
  • 29. Recurrent layer (unrolled) 29 x1 x2 x3 h1 h2 h3 WI WI WI WR WR WR ht = g ( WI xt + WR h(t-1) ) •  x represents the input sequence •  h is the memory (as well as the output of the recurrent layer) •  h is computed from the past memory and the current input •  h summarizes the sequence up to the current time steptime
  • 30. Training RNNs 30 •  Exploding/vanishing gradients •  Initialization •  Gradient clipping •  Optimizers with Adaptive learning rates (e.g. Adagrad) •  LSTM to capture long term dependencies
  • 31. 31 x! tanh! x! +! x! output gate! forget gate! Input gate! input! LSTM
  • 32. Applications of RNNs 32 •  Image captioning •  Speech recognition •  Machine translation •  Time-series analysis
  • 34. RNN example 1 34 •  Source at https://github.com/anlthms/meetup2/blob/master/rnn1.py •  Command line: ./rnn1.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v (Does not work well! This example demonstrates challenges of training RNNs)
  • 35. RNN example 2 35 •  Source at https://github.com/anlthms/meetup2/blob/master/rnn2.py •  Command line: ./rnn2.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v (Uses Glorot init, gradient clipping, Adagrad and LSTM)
  • 36. RNN example 3 36 •  Source at https://github.com/anlthms/meetup2/blob/master/rnn3.py •  Command line: ./rnn3.py -e 16 -w /home/ubuntu/nervana/music -r 0 -v (Uses multiple bi-rnn layers)
  • 37. Final example 37 •  Source at https://github.com/NervanaSystems/neon/blob/master/examples/whale_calls.py •  Command line: ./whale_calls.py -e 16 -w /home/ubuntu/nervana/wdc -r 0 -s whales.pkl -v
  • 38. Network structure of final example 38 Convolution Layer 'Convolution_0': 1 x (81x49) inputs, 128 x (79x24) outputs, 0,0 padding, 1,2 stride Activation Layer 'Convolution_0_Rectlin': Rectlin Convolution Layer 'Convolution_1': 128 x (79x24) inputs, 256 x (77x22) outputs, 0,0 padding, 1,1 stride BatchNorm Layer 'Convolution_1_bnorm': 433664 inputs, 1 steps, 256 feature maps Activation Layer 'Convolution_1_Rectlin': Rectlin Pooling Layer 'Pooling_0': 256 x (77x22) inputs, 256 x (38x11) outputs Convolution Layer 'Convolution_2': 256 x (38x11) inputs, 512 x (37x10) outputs, 0,0 padding, 1,1 stride BatchNorm Layer 'Convolution_2_bnorm': 189440 inputs, 1 steps, 512 feature maps Activation Layer 'Convolution_2_Rectlin': Rectlin BiRNN Layer 'BiRNN_0': 18944 inputs, (256 outputs) * 2, 10 steps BiRNN Layer 'BiRNN_1': (256 inputs) * 2, (256 outputs) * 2, 10 steps BiRNN Layer 'BiRNN_2': (256 inputs) * 2, (256 outputs) * 2, 10 steps RecurrentOutput choice RecurrentLast : (512, 10) inputs, 512 outputs Linear Layer 'Linear_0': 512 inputs, 32 outputs BatchNorm Layer 'Linear_0_bnorm': 32 inputs, 1 steps, 32 feature maps Activation Layer 'Linear_0_Rectlin': Rectlin Linear Layer 'Linear_1': 32 inputs, 2 outputs Activation Layer 'Linear_1_Softmax': Softmax
  • 39. More info 39 Nervana’s deep learning tutorials: https://www.nervanasys.com/deep-learning-tutorials/ Github page: https://github.com/NervanaSystems/neon For more information, contact: [email protected]