SlideShare a Scribd company logo
Intro to Deep Lerning
for Computer Vision
Nadav Carmel
Highlights
 Common CV tasks
 CNN’s – intro
 Filters, maxpools, simple example
 Normalization types
 Inception network
 Object detection
 R-CNN
 YOLO
 Face recognition
 One shot learning
 Siamese net
Computer Vision problems
 Image classification
 Object detection
 Face recognition
 Segmentation
 Style transfer
 But there are more!!
We’ll focus
on these
CNN’s into
Convolutional filter concept – recap
 Each conv-layer has 4 params:
 Filter size (filter height = filter width)
 stride
 Input channels
 Output channels (number of filters)
6 convolutional
filters
Max-pool concept – recap
 Usally has 2 params:
 Filter size (filter height = filter width)
 Stride
 Operation is done per channel (thus: đ¶đ‘–đ‘› = đ¶ 𝑜𝑱𝑡)
 It is a non-learnble filter
Deep learning for Computer Vision intro
Normalization types
 We sometimes want the data at each layer to be normalized
 Improves learning speed and robustness
 There are few types of normalizations:
 H, W: image size
 N: batch size
 C: cannels
Batch Norm
Inception network:
‱ There are many architectural questions when designing CNN’s:
 What filter size to choose? (larger one = better spatial representation, smaller one =
lower computational complexity)
 Add maxpool or not?
 Etc.
 One approach of handeling these question is: let’s try everything!
Single Inception block
Full inceptrion network
Object detection
Object detection
 Some of the most important CV tasks include:
 Object detection = classification + localization of multiple
objects
 Detection model output: 𝑩 = 𝑝𝑐, 𝑏 đ‘„ , 𝑏 𝑩 , 𝑏ℎ , 𝑏 đ‘€ , 𝑐1 , 𝑐2 , 𝑐3
Common algorithms:
Region (sliding window) CNN
 A reagion proposal (selective search)
algorithm suggests regions for the bounding
box to go over (Ross Girshick et al.)
 These candidate boxes are resized to
match the CNN input size
 They are then fed into the convolutional
neural network that produces a features
vector
 The feature vector is fed into SVM to
produce the classification
 Finally, remove boxes with the highest
shared area in a process called non-max
suppression
You Only Look Once - YOLO
 Most object detection algorithms use regions to localize
the object within the image, and do not look at the
complete image
 In YOLO a convolutional network uses the entire image to
predicts the bounding boxes and the class probabilities for
these boxes
YOLO algo description
1. Split the image into grid of cells
2. Each cell is responsible for predicting a number bounding boxes (should match the
number of objects in the cell)
3. Run the model once to get all cells predictions
4. Remove boxes with the highest shared area in a process called non-max suppression
YOLO
YOLO summary
 Main algorithm properties include:
 Extremely fast inference – makes predictions with a single network evaluation, unlike R-
CNN which requires thousands for a single image
 Since it looks at the whole image at once, its predictions are informed by global context
in the image
 Requires bounding-box tagging for training
 Since the model learns to predict bounding boxes (𝑏 đ‘„, 𝑏 𝑩, 𝑏ℎ, 𝑏 đ‘€) from the data, it
struggles to generalize to objects in new or unusual aspect ratios or configurations
Non-max suppression algorithm
 Remove all boxes with 𝑝𝑐 < 0.6
 While there are any remaining boxes:
 Pick the box with the largest Pc - output it as prediction
 Discard any box with IOU > 0.5 with the box from previous step
Face recognition
One shot learning
 Say we want to have a classification system to recognize faces
 We only have 1 or 2 images of each person
 One aproach can be to train a CNN which maps the inputs to a (one hot) label
vector, where each element corresponds to each person
 BUT:
 Train a neural net with only 1 or 2 imaages per class will highly overfit
 Each new person in the ‘pool’ will require a new, longer, output vector (and
system retraining)
One shot learning
 Instead of learining a multiclass classifier, we can learn a similarity function:
𝜌 𝑖𝑚𝑔1, 𝑖𝑚𝑔2
 đŒđ‘“: 𝜌 𝑖𝑚𝑔1, 𝑖𝑚𝑔2 ≄ 𝜏 → ”𝑠𝑎𝑚𝑒”
 𝑒𝑙𝑠𝑒: → ”𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡”
Siamese network
 We want a network that predicts the
similarity between 2 faces
 We want each of the images to be encoded
in a low-dimensional representation,
then fed into the network
 The most common encoding in this case is
computed via a Siamese-network
Triplet loss
 We want to train the siamese
nets in such way that:
 Differentnt images of the same
person will have very similar
representations
 Images of different persons will have
very different representations
 We define a triplet loss
objective:
𝐿 = đ‘šđ‘Žđ‘„ 𝑑 𝑎, 𝑝 − 𝑑 𝑎, 𝑛 + 𝑚𝑎𝑟𝑔𝑖𝑛, 0
 Thank you!

More Related Content

PPTX
Convolutional neural network
Ferdous ahmed
 
PDF
Convolutional neural network
Itachi SK
 
PPTX
Wits presentation 6_28072015
Beatrice van Eden
 
PPTX
Cnn
Mehrnaz Faraz
 
PPTX
Convolutional Neural Network and Its Applications
Kasun Chinthaka Piyarathna
 
PDF
Convolutional Neural Network Models - Deep Learning
Mohamed Loey
 
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn
 
PDF
Introduction to Convolutional Neural Networks
Hannes Hapke
 
Convolutional neural network
Ferdous ahmed
 
Convolutional neural network
Itachi SK
 
Wits presentation 6_28072015
Beatrice van Eden
 
Convolutional Neural Network and Its Applications
Kasun Chinthaka Piyarathna
 
Convolutional Neural Network Models - Deep Learning
Mohamed Loey
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn
 
Introduction to Convolutional Neural Networks
Hannes Hapke
 

What's hot (20)

PDF
CNN
Ukjae Jeong
 
PPTX
Machine Learning - Convolutional Neural Network
Richard Kuo
 
PPTX
Comparison of Learning Algorithms for Handwritten Digit Recognition
Safaa Alnabulsi
 
PPTX
Image classification with Deep Neural Networks
Yogendra Tamang
 
DOCX
Digit recognition using mnist database
btandale
 
PPTX
Convolutional neural network from VGG to DenseNet
SungminYou
 
PPTX
Introduction to CNN
Shuai Zhang
 
PDF
Deep learning
Rouyun Pan
 
PPT
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
PDF
Deep Learning and Tensorflow Implementation(ë”„ëŸŹë‹, í…ì„œí”ŒëĄœìš°, íŒŒìŽìŹ, CNN)_Myungyon Ki...
Myungyon Kim
 
PPT
Deep Learning
Roshan Chettri
 
PPTX
Convolutional neural network
MojammilHusain
 
PPTX
Handwritten Digit Recognition(Convolutional Neural Network) PPT
RishabhTyagi48
 
PDF
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
PyData
 
PPTX
Convolution Neural Network (CNN)
Basit Rafiq
 
PPTX
Convolutional neural networks
Roozbeh Sanaei
 
PDF
Convolutional Neural Networks : Popular Architectures
ananth
 
PDF
Image processing by manish myst, ssgbcoet
Manish Myst
 
PPTX
Convolution Neural Network (CNN)
Suraj Aavula
 
Machine Learning - Convolutional Neural Network
Richard Kuo
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Safaa Alnabulsi
 
Image classification with Deep Neural Networks
Yogendra Tamang
 
Digit recognition using mnist database
btandale
 
Convolutional neural network from VGG to DenseNet
SungminYou
 
Introduction to CNN
Shuai Zhang
 
Deep learning
Rouyun Pan
 
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
Deep Learning and Tensorflow Implementation(ë”„ëŸŹë‹, í…ì„œí”ŒëĄœìš°, íŒŒìŽìŹ, CNN)_Myungyon Ki...
Myungyon Kim
 
Deep Learning
Roshan Chettri
 
Convolutional neural network
MojammilHusain
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
RishabhTyagi48
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
PyData
 
Convolution Neural Network (CNN)
Basit Rafiq
 
Convolutional neural networks
Roozbeh Sanaei
 
Convolutional Neural Networks : Popular Architectures
ananth
 
Image processing by manish myst, ssgbcoet
Manish Myst
 
Convolution Neural Network (CNN)
Suraj Aavula
 
Ad

Similar to Deep learning for Computer Vision intro (20)

PPT
one shot15729752 Deep Learning for AI and DS
ManiMaran230751
 
PDF
ç‰©ä»¶ć”æžŹèˆ‡èŸšè­˜æŠ€èĄ“
CHENHuiMei
 
PPTX
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Jia-Bin Huang
 
PDF
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Sergey Karayev
 
PDF
Cheatsheet convolutional-neural-networks
Steve Nouri
 
PPTX
Introduction_to_Deep_learning_Standford_university by Angelica Sun
ssuser36b130
 
PPTX
Computer vision-nit-silchar-hackathon
Aditya Bhattacharya
 
PDF
imageclassification-160206090009.pdf
KammetaJoshna
 
PPTX
[Revised] Intro to CNN
Vincent Tatan
 
PDF
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET Journal
 
PDF
Deep Neural Networks Presentation
Bohdan Klimenko
 
PDF
_AI_Stanford_Super_#DeepLearning_Cheat_Sheet!_😊🙃😀🙃😊.pdf
SongsDrizzle
 
PDF
super-cheatsheet-deep-learning.pdf
DeanSchoolofElectron
 
PPTX
DeepLearning
ShahzadAsgharArain
 
PDF
DL.pdf
ssuserd23711
 
PDF
IRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET Journal
 
PPTX
Introduction to Computer Vision and its Applications
RamSIyer2
 
PDF
Convolutional neural network
Yan Xu
 
PDF
CNN Algorithm
georgejustymirobi1
 
PDF
Deep learning and image analytics using Python by Dr Sanparit
BAINIDA
 
one shot15729752 Deep Learning for AI and DS
ManiMaran230751
 
ç‰©ä»¶ć”æžŹèˆ‡èŸšè­˜æŠ€èĄ“
CHENHuiMei
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Jia-Bin Huang
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Sergey Karayev
 
Cheatsheet convolutional-neural-networks
Steve Nouri
 
Introduction_to_Deep_learning_Standford_university by Angelica Sun
ssuser36b130
 
Computer vision-nit-silchar-hackathon
Aditya Bhattacharya
 
imageclassification-160206090009.pdf
KammetaJoshna
 
[Revised] Intro to CNN
Vincent Tatan
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET Journal
 
Deep Neural Networks Presentation
Bohdan Klimenko
 
_AI_Stanford_Super_#DeepLearning_Cheat_Sheet!_😊🙃😀🙃😊.pdf
SongsDrizzle
 
super-cheatsheet-deep-learning.pdf
DeanSchoolofElectron
 
DeepLearning
ShahzadAsgharArain
 
DL.pdf
ssuserd23711
 
IRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET Journal
 
Introduction to Computer Vision and its Applications
RamSIyer2
 
Convolutional neural network
Yan Xu
 
CNN Algorithm
georgejustymirobi1
 
Deep learning and image analytics using Python by Dr Sanparit
BAINIDA
 
Ad

Recently uploaded (20)

PPTX
Limbic system_components_connections_ functions.pptx
muralinath2
 
PDF
Multiwavelength Study of a Hyperluminous X-Ray Source near NGC6099: A Strong ...
Sérgio Sacani
 
PPTX
fghvqwhfugqaifbiqufbiquvbfuqvfuqyvfqvfouiqvfq
PERMISONJERWIN
 
PPT
1a. Basic Principles of Medical Microbiology Part 2 [Autosaved].ppt
separatedwalk
 
PPTX
INTRO-TO-CRIM-THEORIES-OF-CRIME-2023 (1).pptx
ChrisFlickIII
 
PDF
Renewable Energy Resources (Solar, Wind, Nuclear, Geothermal) Presentation
RimshaNaeem23
 
PPTX
Unit 4 - Astronomy and Astrophysics - Milky Way And External Galaxies
RDhivya6
 
PPTX
Laboratory design and safe microbiological practices
Akanksha Divkar
 
PDF
Identification of unnecessary object allocations using static escape analysis
ESUG
 
PPTX
The Toxic Effects of Aflatoxin B1 and Aflatoxin M1 on Kidney through Regulati...
OttokomaBonny
 
PPTX
Embark on a journey of cell division and it's stages
sakyierhianmontero
 
PDF
Approximating manifold orbits by means of Machine Learning Techniques
Esther Barrabés Vera
 
PPTX
Cell Structure and Organelles Slides PPT
JesusNeyra8
 
PPT
1. Basic Principles of Medical Microbiology Part 1.ppt
separatedwalk
 
PPTX
Home Garden as a Component of Agroforestry system : A survey-based Study
AkhangshaRoy
 
PPTX
METABOLIC_SYNDROME Dr Shadab- kgmu lucknow pptx
ShadabAlam169087
 
PDF
Drones in Disaster Response: Real-Time Data Collection and Analysis (www.kiu...
publication11
 
PDF
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
ESUG
 
PPTX
Qualification of.UV visible spectrophotometer pptx
shrutipandit17
 
PPTX
ANTIANGINAL DRUGS.pptx m pharm pharmacology
46JaybhayAshwiniHari
 
Limbic system_components_connections_ functions.pptx
muralinath2
 
Multiwavelength Study of a Hyperluminous X-Ray Source near NGC6099: A Strong ...
Sérgio Sacani
 
fghvqwhfugqaifbiqufbiquvbfuqvfuqyvfqvfouiqvfq
PERMISONJERWIN
 
1a. Basic Principles of Medical Microbiology Part 2 [Autosaved].ppt
separatedwalk
 
INTRO-TO-CRIM-THEORIES-OF-CRIME-2023 (1).pptx
ChrisFlickIII
 
Renewable Energy Resources (Solar, Wind, Nuclear, Geothermal) Presentation
RimshaNaeem23
 
Unit 4 - Astronomy and Astrophysics - Milky Way And External Galaxies
RDhivya6
 
Laboratory design and safe microbiological practices
Akanksha Divkar
 
Identification of unnecessary object allocations using static escape analysis
ESUG
 
The Toxic Effects of Aflatoxin B1 and Aflatoxin M1 on Kidney through Regulati...
OttokomaBonny
 
Embark on a journey of cell division and it's stages
sakyierhianmontero
 
Approximating manifold orbits by means of Machine Learning Techniques
Esther Barrabés Vera
 
Cell Structure and Organelles Slides PPT
JesusNeyra8
 
1. Basic Principles of Medical Microbiology Part 1.ppt
separatedwalk
 
Home Garden as a Component of Agroforestry system : A survey-based Study
AkhangshaRoy
 
METABOLIC_SYNDROME Dr Shadab- kgmu lucknow pptx
ShadabAlam169087
 
Drones in Disaster Response: Real-Time Data Collection and Analysis (www.kiu...
publication11
 
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
ESUG
 
Qualification of.UV visible spectrophotometer pptx
shrutipandit17
 
ANTIANGINAL DRUGS.pptx m pharm pharmacology
46JaybhayAshwiniHari
 

Deep learning for Computer Vision intro

  • 1. Intro to Deep Lerning for Computer Vision Nadav Carmel
  • 2. Highlights  Common CV tasks  CNN’s – intro  Filters, maxpools, simple example  Normalization types  Inception network  Object detection  R-CNN  YOLO  Face recognition  One shot learning  Siamese net
  • 3. Computer Vision problems  Image classification  Object detection  Face recognition  Segmentation  Style transfer  But there are more!! We’ll focus on these
  • 5. Convolutional filter concept – recap  Each conv-layer has 4 params:  Filter size (filter height = filter width)  stride  Input channels  Output channels (number of filters) 6 convolutional filters
  • 6. Max-pool concept – recap  Usally has 2 params:  Filter size (filter height = filter width)  Stride  Operation is done per channel (thus: đ¶đ‘–đ‘› = đ¶ 𝑜𝑱𝑡)  It is a non-learnble filter
  • 8. Normalization types  We sometimes want the data at each layer to be normalized  Improves learning speed and robustness  There are few types of normalizations:  H, W: image size  N: batch size  C: cannels
  • 10. Inception network: ‱ There are many architectural questions when designing CNN’s:  What filter size to choose? (larger one = better spatial representation, smaller one = lower computational complexity)  Add maxpool or not?  Etc.  One approach of handeling these question is: let’s try everything!
  • 14. Object detection  Some of the most important CV tasks include:  Object detection = classification + localization of multiple objects  Detection model output: 𝑩 = 𝑝𝑐, 𝑏 đ‘„ , 𝑏 𝑩 , 𝑏ℎ , 𝑏 đ‘€ , 𝑐1 , 𝑐2 , 𝑐3
  • 16. Region (sliding window) CNN  A reagion proposal (selective search) algorithm suggests regions for the bounding box to go over (Ross Girshick et al.)  These candidate boxes are resized to match the CNN input size  They are then fed into the convolutional neural network that produces a features vector  The feature vector is fed into SVM to produce the classification  Finally, remove boxes with the highest shared area in a process called non-max suppression
  • 17. You Only Look Once - YOLO  Most object detection algorithms use regions to localize the object within the image, and do not look at the complete image  In YOLO a convolutional network uses the entire image to predicts the bounding boxes and the class probabilities for these boxes
  • 18. YOLO algo description 1. Split the image into grid of cells 2. Each cell is responsible for predicting a number bounding boxes (should match the number of objects in the cell) 3. Run the model once to get all cells predictions 4. Remove boxes with the highest shared area in a process called non-max suppression
  • 19. YOLO
  • 20. YOLO summary  Main algorithm properties include:  Extremely fast inference – makes predictions with a single network evaluation, unlike R- CNN which requires thousands for a single image  Since it looks at the whole image at once, its predictions are informed by global context in the image  Requires bounding-box tagging for training  Since the model learns to predict bounding boxes (𝑏 đ‘„, 𝑏 𝑩, 𝑏ℎ, 𝑏 đ‘€) from the data, it struggles to generalize to objects in new or unusual aspect ratios or configurations
  • 21. Non-max suppression algorithm  Remove all boxes with 𝑝𝑐 < 0.6  While there are any remaining boxes:  Pick the box with the largest Pc - output it as prediction  Discard any box with IOU > 0.5 with the box from previous step
  • 23. One shot learning  Say we want to have a classification system to recognize faces  We only have 1 or 2 images of each person  One aproach can be to train a CNN which maps the inputs to a (one hot) label vector, where each element corresponds to each person  BUT:  Train a neural net with only 1 or 2 imaages per class will highly overfit  Each new person in the ‘pool’ will require a new, longer, output vector (and system retraining)
  • 24. One shot learning  Instead of learining a multiclass classifier, we can learn a similarity function: 𝜌 𝑖𝑚𝑔1, 𝑖𝑚𝑔2  đŒđ‘“: 𝜌 𝑖𝑚𝑔1, 𝑖𝑚𝑔2 ≄ 𝜏 → ”𝑠𝑎𝑚𝑒”  𝑒𝑙𝑠𝑒: → ”𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡”
  • 25. Siamese network  We want a network that predicts the similarity between 2 faces  We want each of the images to be encoded in a low-dimensional representation, then fed into the network  The most common encoding in this case is computed via a Siamese-network
  • 26. Triplet loss  We want to train the siamese nets in such way that:  Differentnt images of the same person will have very similar representations  Images of different persons will have very different representations  We define a triplet loss objective: 𝐿 = đ‘šđ‘Žđ‘„ 𝑑 𝑎, 𝑝 − 𝑑 𝑎, 𝑛 + 𝑚𝑎𝑟𝑔𝑖𝑛, 0

Editor's Notes

  • #13: Filter concat = channel concat
  • #14: Each yellow block = label