SlideShare a Scribd company logo
Deep	Learning	in	Computer	Vision
Axon@Grokking
Oct.	28,	2017
Dang	Huynh
Education
• Ph.D.	in	Computer	Science	(France)
Work
• Jan	2017	– now:	Axon	Enterprise
• 2015	– 2016:	Misfit
• 2011	– 2015:	Nokia	Bell	Labs
Research	domains
• Machine	vision.
• Data	science.
• Telecommunication	systems.
Axon Enterprise
Misfit
Nokia Bell Labs
2/43
!=
We	are	AXON!
3/43
Outline
•Refresh
•Computer	vision
•Deep	learning	in	Computer	vision
•Theory	vs.	Reality
•Demo
4/43
Refresh
Machine	learning	and	Deep	learning
5/43
Machine	learning
Input	data	à prediction	model à output	label
y
x
y	=	F(x)
x0
y0?
6/43
Machine	Learning
y	=	4x1
3 - 2x2
2 +	8
x2
f(x)	=	x3x1
f(x)	=	x2
+1
y
weight=1
0
0
1
4
-2
8
7/43
Machine	Learning
Challenges
• Relevant	data	acquisition
• Data	preprocessing
• Feature	selection
• Model	selection:	simplicity	versus	complexity
• Result	interpretation.
8/43
Deep	Learning
• Machine	Learning	with	many	(deep)	hidden	layers
x2
x1
+1
+1
+1
y1
y2
Hidden	layersInput Output
9/43
Why	deep	learning?
Amount	of	data
Performance
Deep	learning
Machine	learning
10/43
Computer	Vision
intro
11/43
Make	computers	understand	images	and	video:
- Detection
- Recognition
- Tracking
- Extraction
Computer	Vision
Object detection 12/43
Still	there	are	challenges:	object	can	be…
Computer	Vision
… partly	occluded	
… or	even	fully	occluded.	
13/43
Challenge
We were building a human detector, and we accidentally got future human detector!
14/43
15/43
Traditional	approach																Deep	learning	approach
has two eyes?
has a nose below eyes?
Ok, it’s a face!
…..
Feature engineering NO feature engineering
Traditional	approach	vs.	Deep	learning
16/43
ImageNet: 1.2 million images with 1000 object categories
Source:	http://pattern-recognition.weebly.com/
Deep learningTradition
Deep	Learning in Computer	Vision
17/43
Computer	Vision
What	computer	sees
Red
43 45 21
13 34 12
23 88 55
Green
19 89 27
17 57 29
75 56 94
Blue
19 89 27
17 57 29
75 56 94
y	=	F(Red,	Green,	Blue)
3-D	input	array
Facial	detection
18/43
Intuition
x2
x1
+1
+1
+1
y1
y2
Hidden	layersInput Output
Facial	detection
Green
Red
Blue
19/43
Convolutional	Neural	Network	(CNN)
Idea:	having	a	filter	scanning	over	image.
Output	matrix
Input	matrix	
(e.g.,	image)
Filter	(grey)
Source:	https://github.com/vdumoulin/conv_arithmetic
Convolutional	process
20/43
CNN – Striding	and	Padding
Control	how	the	filter	convolves	around	the	input	matrix.
Output	matrix
Input	matrix	
(e.g.,	image)
Filter	(grey)
Source:	https://github.com/vdumoulin/conv_arithmetic
Stride	=	2,	Zero-padding	=	1
21/43
Convolutional	operation
0 1 1 1 0 0 0
0 0 1 1 1 0 0
0 0 0 1 1 1 0
0 0 0 1 1 0 0
0 0 1 1 0 0 0
0 1 1 0 0 0 0
1 1 0 0 0 0 0
1 0 1
0 1 0
1 0 1
1 4 3 4 1
1 2 4 3 3
1 2 3 4 1
1 3 3 1 1
3 3 1 1 0
5	x	5	
Output
3 x	3	
Filter
7	x	7	
Input
* =
Input [height1,	width1,	#	of	channels]
Filter [height2,	width2,	#	of	channels]
Output [height3,	width3,	#	of	filters]	
22/43
Rectified	Linear	Unit	(ReLU)
ReLU:	F(y)	=	max(0,y)
-3 2 0
1 -1 0
-5 2 4
0 2 0
1 0 0
0 2 4
ReLU
Non-linear	activation	function.
23/43
Max	Pooling
1 0 2 3
4 6 6 8
3 1 1 0
1 2 2 4
6 8
3 4
Reduce	dimension	and	avoid	overfitting.
Max	pool	with	2x2	filter	and	stride	2
24/43
Example
Input
24	x	24	x	3
11	x	11	x	28 4	x	4 x	48 3	x	3	x	64
face/non-face
bounding	box	
regression
2
4
Conv:	3	x	3
MP:	2	x	2
Conv:	3	x	3
MP:	3	x	3
Conv:	2	x	2 Fully	connected
128
Suppose	that	all	Max	Pooling	(MP)	layer	has	stride	2.
Input:	24 x	24 x	3
Conv:	3 x	3 x	3
MP:	2	x	2	(stride	2)
à Output	dimension	(24 – 3 +	1)	/	2	=	11
25/43
Object	scales
• Detect	object	of	various	sizes.
Source:	https://www.pyimagesearch.com
Input
Tradeoffs?
scans	over
26/43
Data	augmentation
• Generate	more	artificial	data	points	from	base	data.
• Apply	with	care to	other	data	types!
Original Little noise Moderate Heavy noise
27/43
Complex	data	augmentation
Face rotation
28/43
Why	data	augmentation?
WITHOUT augmentation
AXON detection
WITH augmentation
29/43
How	to	benchmark?
Facebook detection 30/43
Theory	vs.	Reality
31/43
Deep	learning	in	Computer	Vision
Pros:
• DL	reduces	the	need	for	feature	engineering.
• DL	outperforms	classical	Computer	Vision	approaches.
Cons:
• DL	requires	a	huge	amount	of	data	(>	100K	samples).
• DL	is	extremely	computationally	expensive	to	train	(weeks	on	GPUs).
• DL	model	structure	is	a	black	box.
32/43
Performance	vs.	Portability
Theory Reality
33/43
Performance	vs.	Power	consumption
Theory Reality
Portable battery
34/43
Special	hardware	for	Deep	Learning
Jetson TX2 (NVDIA) Google TPU Movidius Myriad
• Optimized	for	specific	use	case.
• Not	plug-and-play,	need	good	engineers	to	make	it	work.
Still	far	from	consumer…
35/43
Privacy
• The	police	are	our	customers,	so	data	privacy	is	important.
• Can	we	“extract	features”	from	the	private	data?
36/43
Demo
37/43
Workflow	and	tool	set
38/43
Skin	blurring
39/43
Facial	detection	with	tracking
40/43
License	plate	detection
41/43
Take	Home	message
42/43
Industry	perspective
Always	consider	the	following	4Ps:
• Performance
• Power	consumption
• Portability
• Price
Deep	learning	is	not	a	magic:	tradeoff	always	exists!
43/43
Thank	you
44/43
We	are	Hiring
Full	Stack,	Research	Engineers,	Security.
https://jobs.lever.co/axon
45/43

More Related Content

Similar to Grokking TechTalk #21: Deep Learning in Computer Vision (20)

PPTX
Deep learning
Rajgupta258
 
PDF
Intro to Deep Learning for Computer Vision
Christoph Körner
 
PDF
Deep Learning AtoC with Image Perspective
Dong Heon Cho
 
PPTX
Computer vision, machine, and deep learning
Igi Ardiyanto
 
PDF
Deep Learning - Overview of my work II
Mohamed Loey
 
PPSX
infoShare AI Roadshow 2018 - Krzysztof Kudryński & Błażej Kubiak (TomTom) - D...
Infoshare
 
PPTX
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Turi, Inc.
 
PPTX
PyConZA'17 Deep Learning for Computer Vision
Alex Conway
 
PDF
Introduction to Deep Learning: Concepts, Architectures, and Applications
Amr Rashed
 
PDF
DLD meetup 2017, Efficient Deep Learning
Brodmann17
 
PPTX
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
PDF
Application of deep leaning to computer vision
Djamal Abide, MSc
 
PPTX
Obscenity Detection in Images
Anil Kumar Gupta
 
PDF
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
Edge AI and Vision Alliance
 
PDF
CNN Algorithm
georgejustymirobi1
 
PDF
IRJET- Deep Learning Techniques for Object Detection
IRJET Journal
 
PPTX
Introduction to Deep learning
Massimiliano Patacchiola
 
PPTX
Strata London - Deep Learning 05-2015
Turi, Inc.
 
PDF
An Introduction to Deep Learning
Poo Kuan Hoong
 
PDF
Machine Learning and Deep Learning with R
Poo Kuan Hoong
 
Deep learning
Rajgupta258
 
Intro to Deep Learning for Computer Vision
Christoph Körner
 
Deep Learning AtoC with Image Perspective
Dong Heon Cho
 
Computer vision, machine, and deep learning
Igi Ardiyanto
 
Deep Learning - Overview of my work II
Mohamed Loey
 
infoShare AI Roadshow 2018 - Krzysztof Kudryński & Błażej Kubiak (TomTom) - D...
Infoshare
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Turi, Inc.
 
PyConZA'17 Deep Learning for Computer Vision
Alex Conway
 
Introduction to Deep Learning: Concepts, Architectures, and Applications
Amr Rashed
 
DLD meetup 2017, Efficient Deep Learning
Brodmann17
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
Application of deep leaning to computer vision
Djamal Abide, MSc
 
Obscenity Detection in Images
Anil Kumar Gupta
 
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
Edge AI and Vision Alliance
 
CNN Algorithm
georgejustymirobi1
 
IRJET- Deep Learning Techniques for Object Detection
IRJET Journal
 
Introduction to Deep learning
Massimiliano Patacchiola
 
Strata London - Deep Learning 05-2015
Turi, Inc.
 
An Introduction to Deep Learning
Poo Kuan Hoong
 
Machine Learning and Deep Learning with R
Poo Kuan Hoong
 

More from Grokking VN (20)

PDF
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking VN
 
PDF
Grokking Techtalk #45: First Principles Thinking
Grokking VN
 
PDF
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking VN
 
PDF
Grokking Techtalk #43: Payment gateway demystified
Grokking VN
 
PPTX
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking VN
 
PPTX
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking VN
 
PDF
Grokking Techtalk #39: Gossip protocol and applications
Grokking VN
 
PDF
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Grokking VN
 
PDF
Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking VN
 
PPTX
Grokking Techtalk #37: Data intensive problem
Grokking VN
 
PPTX
Grokking Techtalk #37: Software design and refactoring
Grokking VN
 
PDF
Grokking TechTalk #35: Efficient spellchecking
Grokking VN
 
PDF
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking VN
 
PDF
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking VN
 
PDF
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking VN
 
PDF
SOLID & Design Patterns
Grokking VN
 
PDF
Grokking TechTalk #31: Asynchronous Communications
Grokking VN
 
PDF
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking VN
 
PDF
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking VN
 
PDF
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking VN
 
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking VN
 
Grokking Techtalk #45: First Principles Thinking
Grokking VN
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking VN
 
Grokking Techtalk #43: Payment gateway demystified
Grokking VN
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking VN
 
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking VN
 
Grokking Techtalk #39: Gossip protocol and applications
Grokking VN
 
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Grokking VN
 
Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking VN
 
Grokking Techtalk #37: Data intensive problem
Grokking VN
 
Grokking Techtalk #37: Software design and refactoring
Grokking VN
 
Grokking TechTalk #35: Efficient spellchecking
Grokking VN
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking VN
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking VN
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking VN
 
SOLID & Design Patterns
Grokking VN
 
Grokking TechTalk #31: Asynchronous Communications
Grokking VN
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking VN
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking VN
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking VN
 
Ad

Recently uploaded (20)

PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Digital Circuits, important subject in CS
contactparinay1
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Ad

Grokking TechTalk #21: Deep Learning in Computer Vision