Introduction talk to Computer Vision

‫ממוחשבת‬ ‫לראיה‬ ‫מבוא‬
‫שגיב‬ ‫חן‬
‫ומנכ‬ ‫מייסדת‬"‫בע‬ ‫שגיבטק‬ ‫משותפת‬ ‫לית‬"‫מ‬

‫ההרצאה‬ ‫תוכן‬
•‫ממוחשבת‬ ‫ראיה‬–‫טוב‬ ‫זה‬ ‫ולמה‬ ‫זה‬ ‫מה‬?
•‫בסיסיים‬ ‫דברים‬ ‫כמה‬
•‫ממוחשבת‬ ‫ראיה‬ ‫של‬ ‫קלסיקה‬
•‫ה‬ ‫מהפיכת‬-AI
•‫הממוחשבת‬ ‫הראיה‬ ‫לעולם‬ ‫כניסה‬

‫ממוחשבת‬ ‫ראיה‬ ‫מהי‬?
•‫כלים‬ ‫בעזרת‬ ‫מתמונות‬ ‫מידע‬ ‫והוצאת‬ ‫שיפור‬
‫ממוחשב‬ ‫ועיבוד‬ ‫מתמטיים‬
•‫מטרות‬:
–‫האנושי‬ ‫לצופה‬ ‫התמונה‬ ‫של‬ ‫מיטבית‬ ‫הצגה‬
–‫מידע‬ ‫והפקת‬ ‫תמונות‬ ‫של‬ ‫ממוחשבת‬ ‫אנליזה‬

‫ממוחשבת‬ ‫בראיה‬ ‫משתמשים‬ ‫היכן‬?
•‫רפואה‬
•‫אוטונומיים‬ ‫רכבים‬
•‫רבודה‬ ‫ומציאות‬ ‫מדומה‬ ‫מציאות‬
•‫תעשיה‬:‫פגמים‬ ‫איבחון‬
•‫חברתיות‬ ‫רשתות‬
•‫בטחונית‬ ‫תעשיה‬,‫אבטחה‬
‫ואכיפת‬‫חוק‬
•‫חקלאות‬
•‫וטיפוח‬ ‫אופנה‬
Taken from: http://www.cvl.isy.liu.se/

‫דיגיטלית‬ ‫תמונה‬ ‫מהי‬?

‫דיגיטלית‬ ‫תמונה‬ ‫נוצרת‬ ‫איך‬?
•‫ה‬"‫אופטיקה‬"-‫הגל‬ ‫לאורך‬ ‫שרגיש‬ ‫סנסור‬ ‫ידי‬ ‫על‬ ‫תמונה‬ ‫רכישת‬
‫המתאים‬:‫נראה‬ ‫אור‬,US,‫אינפרא‬-‫אדום‬
•‫שהתקבלה‬ ‫לאנרגיה‬ ‫פרופורציונלי‬ ‫חשמלי‬ ‫אות‬ ‫מייצר‬ ‫הסנסור‬
•‫האנלוגי‬ ‫האות‬ ‫דגימת‬
•DSP-‫בצבע‬ ‫טיפול‬,‫שגויים‬ ‫פיקסלים‬,‫רעש‬

‫דיגיטלית‬ ‫תמונה‬ ‫נוצרת‬ ‫איך‬?

‫האנושית‬ ‫הראיה‬ ‫מערכת‬
‫מ‬ ‫לקוח‬-Wikipedia

‫מרחבית‬ ‫דגימה‬
Taken from Digital Image Processing, Gonzalez

‫נייקוויסט‬ ‫ותדר‬ ‫הדגימה‬ ‫משפט‬
•‫דגום‬ ‫אות‬ ‫לשחזר‬ ‫כדי‬ ‫כי‬ ‫קובע‬ ‫זה‬ ‫משפט‬,‫קצב‬
‫האות‬ ‫מתדר‬ ‫כפליים‬ ‫לפחות‬ ‫להיות‬ ‫צריך‬ ‫הדגימה‬
‫הדגום‬.‫נקרא‬ ‫זה‬ ‫דגימה‬ ‫קצב‬:‫נייקויסט‬ ‫תדר‬

‫צריך‬ ‫אפור‬ ‫רמות‬ ‫כמה‬?
•‫ב‬ ‫משתמשים‬ ‫הצגה‬ ‫לצרכי‬-8‫ביט‬
•‫לצרכי‬‫בנות‬ ‫בתמונות‬ ‫מטפלים‬ ‫תמונה‬ ‫ניתוח‬10-
12‫יותר‬ ‫ואף‬ ‫ביט‬

Taken from Digital Image Processing, Gonzalez

‫בתמונה‬ ‫רעשים‬ ‫ניקוי‬

Linear Denoising Filters
Speckle Noise Gaussian Noise Salt & Pepper Noise

Median Filter
Speckle Noise Gaussian Noise Salt & Pepper Noise

‫שפות‬ ‫ומציאת‬ ‫סגמנטציה‬

‫סגמנטציה‬ ‫מבוססת‬ ‫מה‬ ‫על‬?
•‫אזורים‬ ‫בין‬ ‫מעבר‬ ‫יש‬ ‫בהן‬ ‫נקודות‬ ‫או‬ ‫השפה‬ ‫חיפוש‬–
‫שפות‬ ‫מבוססות‬ ‫שיטות‬
•‫שבו‬ ‫שהפיקסלים‬ ‫אזור‬ ‫הגדרת‬"‫דומים‬"‫לזה‬ ‫זה‬–
‫אזורים‬ ‫מבוססות‬ ‫שיטות‬

Gray-level profile
First derivative
Second derivative

Approximations
 Sobel
 Prewitt
-1-2-1
000
121
10-1
20-2
10-1
-1-1-1
000
111
10-1
10-1
10-1

Introduction talk to Computer Vision

Global Processing: The Hough Transform
•‫ע‬ ‫לתיאור‬ ‫ניתן‬ ‫בתמונה‬ ‫ישר‬ ‫כל‬"‫משוואה‬ ‫י‬.
•‫קווים‬ ‫אינסוף‬ ‫לעבור‬ ‫יכולים‬ ‫הישר‬ ‫על‬ ‫נקודה‬ ‫כל‬ ‫דרך‬
•‫בהתמרת‬Hough‫נקודה‬ ‫כל‬"‫מצביעה‬"‫הקווים‬ ‫עבור‬
‫דרכה‬ ‫לעבור‬ ‫שיכולים‬
•‫קולות‬ ‫הרבה‬ ‫הכי‬ ‫עם‬ ‫הישר‬-‫מנצח‬!

SIFT
Scale Invariant Transform

A motivating application
Building a panorama
• We need to match/align/register images
Taken from PPT of M. Brown and D. Lowe, University of British
Columbia

Building a panorama
1) Detect feature points in both images
Columbia

Building a panorama
1. Detect feature points in both images
2. Find corresponding pairs
Columbia

Building a panorama
1. Detect feature points in both images
2. Find corresponding pairs
3. Find a parametric transformation (e.g. homography)
4. Warp (right image to left image)
Taken from PPT of M. Brown and D. Lowe, University of British Columbia

Scale-Space
     , , , , ,L x y G x y I x y  
   2 2 2
2
2
1
, ,
2
x y
G x y e



 


DoG
http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf
Taken from David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.

Scale-Space Extrema Detection
• X is selected if it is larger or smaller than all 26
neighbors

Keypoint Localization
Threshold on minimal contrast
Threshold on ratio of
principal curvatures

Orientation assignment
• Create weighted histogram of
local gradient directions
computed at selected scale
• Assign canonical orientation at
peak of smoothed histogram
• For location of multiple peaks
multiply key point

Keypoint descriptor
https://www.youtube.com/watch?v=FsFC8sCpDSw

• If you’ve been to a concert recently, you’ve probably seen how many people take videos of the
event with mobile phone cameras
• Each user has only one video – taken from one angle and location and of only moderate quality
Mobile Crowdsourcing Video Scene
Reconstruction

Creation of the 3D Video Sequence
The scene is photographed by
several people using their cell
phone camera
The video data is
transmitted via the
cellular network to a
High Performance
Computing server.
Following time
synchronization, resolution
normalization and spatial
registration, the several videos
are merged into a 3-D video
cube.
TIME

Feature detection +
Matching
Fundamental matrix
estimation
Global registration

• Precise : epipolar matching is both fast and accurate
• Dense multi-scale description of the images using binary descriptors
3D Model Reconstruction

• Precise : epipolar matching is both fast and accurate
• Empirical probability density check to discard false positives at occlusion
points
Correct match : max peak above other local max
Wrong match : max peak similar to other local max

• Robust : works even with a minimal set of inputs
• two viewpoints already sufficient for dense reconstruction
• very few erroneous points
3Dreconstruction

3D Visualizer for Dynamic Scenes
moving unknown

‫המחשב‬ ‫מן‬ ‫האדם‬ ‫מותר‬?
Kanizsa

‫ה‬ ‫מהפיכת‬-AI

• Image classification revolutionized by DL
– ImageNet – from 27% to ~5% in three years
What happened in ImageNet 2012 ?

• In this case, our classifier would have a decision boundary
more complex than the simple straight line.
• All the training patterns would be separated perfectly.
Learning Methods - Supervised

• Simpler recognizer  better performance on novel patterns.
• This is one of the central problems in statistical pattern
recognition.

• Feature extraction
 Discriminative features
 Invariant features with respect to translation, rotation and scale.
• Classification
 Use a feature vector provided by a feature extractor to assign the
object to a category

• A multi layered Neural Network (NN)
– With non linearity
• The input to the DL NN is presented at the input layer
– Images, sound, laguage...
• Hidden layers extracts increasingly abstract features
• Output layer contains the result
Neural Networks

• More complex neural networks with multiple layers and multiple output neurons are
theoretically capable of separation using any continuous surface.
• The straight line depicts the separation achieved by a simple Perceptron and the curve the
separation by a multi-layered network (left), which is in theory able to learn any separating
function.
Learning methods
Supervised

What can you do with Machine & Deep Learning ?
• Train classifiers for specific recognition tasks
• Localization
– fully convolutional connected networks
– train networks, e.g. RCNN, YOLO

The task: What object category do we see in the image?
Benchmark: 1000 category “ImageNet” dataset.
(from Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep
Convolutional Neural Networks, NIPS, 2012.)
Image Classification

2012 AlexNet results were what jump started the current neural network wave.
Current results (networks with tens/hundreds of layers) perform better than humans on this task!
Image Classification - continued

Task: Find bounding boxes and categories of objects in the image.
Benchmark: PASCAL VOC / Microsoft COCO
In right video: YOLO 2 (You Look Only Once)
Object Detection
https://www.youtube.com/watch?v=VOC3huqHrssg

Task: Assign each image pixel a category label (person, wall, road, dog, ..)
Example Dataset: PASCAL VOC2012 Challenge
SegModel, Deep-Labv2, and many more..
Image Semantic Segmentation

Task: Given an image and a question, answer the question.
Question Answering

Task: Given a dataset of images, generate new artificial image that look real!
State of the art: Generative Adversarial Networks (GANs)
Image Generation

Steering the wheels of self driving cars
Super resolution
Image completion
Saliency detection
Human pose detection
Facial keypoints (nose, eye, ear..)
Image captioning
Activity recognition
And many, many more tasks ….

SagivTech Traffic Lights Detection using DL
With dlib & a few
images from
Google street
view
https://www.youtube.com/watch?v=jg444J2AmOI
https://www.youtube.com/watch?v=YV4y1iqo_TQ

What can you do to get into this world ?
• Theory:
– Get to know basic computer vision and some “classical” algorithms, e.g.
Viola Jones, SIFT, etc.
– Get to know Deep Learning, e.g. CS 231 by Stanford
• Practice:
– Get to know and use OpenCV
– Hands on experience with Caffe, Tensor Flow etc.

Technology and Professional Services company
Established in 2009 and headquartered in Israel
SagivTech Snapshot
• What we do:
• Technological
Solutions
• Projects
• Research
• Core domains:
• Computer Vision
• Deep Learning
• Code Optimization
• GPU Computing

Thanks for the following SagivTech team members and
collaborators:
Acknowledgements
• Prof. Peter Maass, University of Bremen
• Prof. Pierre Vandergheynst, EPFL
• Dov Eilot, SagivTech
• Jacob Gildenblat, SagivTech
• Amir Egozi, SceneNet project

Thank You
F o r m o r e i n f o r m a t i o n p l e a s e c o n t a c t
C h e n S a g i v
c h e n @ s a g i v t e c h . c o m
+ 9 7 2 5 4 7 7 0 6 0 8 9

Introduction talk to Computer Vision

More Related Content

What's hot (20)

Similar to Introduction talk to Computer Vision (20)

Recently uploaded (20)

Introduction talk to Computer Vision