SlideShare a Scribd company logo
Binary Features
Steven C. Mitchell, Ph.D.
Componica, LLC
What’s a Binary Feature?
What’s a Binary Feature?
-Let’s take an image, and sample a region of interest, a 4x4 patch. Maybe you’re looking for
a face, or a tumor, or gun.
-In a typical object detection system, this region of interest will be scanned across the image
over different scales.
-Typically you scan left-to-right, top-to-bottom in steps of 10% the size of the patch. Then
you shrink the image (or scale the patch) by 20% and start over. Continue doing that until the
image becomes too small or you found what you’re looking for.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-Ok, let’s try different patches with the same binary feature, that is compare location 5 and
11.
-Now imagine I try a whole bunch on pairs on a given patch. 2 vs 14, 8 vs 4, 7 vs 2, etc. I’m
going to get a bunch of yes/no responses base on the patch I happen to show the system.
-Ok, let’s try different patches with the same binary feature, that is compare location 5 and
11.
-Now imagine I try a whole bunch on pairs on a given patch. 2 vs 14, 8 vs 4, 7 vs 2, etc. I’m
going to get a bunch of yes/no responses base on the patch I happen to show the system.
-Ok, let’s try different patches with the same binary feature, that is compare location 5 and
11.
-Now imagine I try a whole bunch on pairs on a given patch. 2 vs 14, 8 vs 4, 7 vs 2, etc. I’m
going to get a bunch of yes/no responses base on the patch I happen to show the system.
Different Types of Binary Features
-Of course there are many different types of binary features, different types of questions I
can ask.
-Simple thresholding, which pixel is brighter, which pixel is brighter based on a threshold,
how similar are two pixels.
-With color it could be comparisons of different channels.
-The main points are, each feature has a fixed set of parameters discovered during training
and fixed for recognition. And the output is a yes or no.
-BTW, I really like the simple comparison of two pixels. It fast and any changes to the
brightness / contrast of a patch will always return the same result.
Decision Tree Overview
-Now in order to make use of these features, let’s talk about decision trees.
Is Grass Wet?
Did you water
the grass?
Y N Y N
Y N
YES
YES
NO
NO
Did it rain last night?
-Let’s saying you’re trying to determine if it rained last night.
-This is a classification problem.
-Here I constructed a simple decision tree based on a couple yes/no questions.
-At the leaves of the this tree are probability histograms created from my data.
-They sum up to one.
-My decision is based on which of the two bars are greater at each leaf.
Is Grass Wet?
YES NO
Y NY N
Do you like oranges?
YES NO
Y N Y N
Selecting Good Questions
-So how do I pick a good question? First pick a question from my Universe of questions, pour
my data thru it, and measure how well it predicts.
-Three commonly used metrics: Entropy, Gini Impurity, and Classification Error.
-What they basically measure is how far away you are from just a 50/50 coin toss.
-Here you can see an irrelevant question like “Do you like oranges” would yield a flat
distribution. This would yield a high entropy, gini impurity, or classification error.
I[5] < I[11]
Y N Y N
Y N
YES
YES
NO
NO
I[7] < I[3]
-Going back to Binary Features, the questions we ask are based on pixel comparisons.
-How do we pick the parameters? Well we random sample from the universe of parameters
and choose the one that yields a good score from the given dataset.
-In the 4x4 patch, I would pick two random numbers from 0 to 15 (no duplicates) and a
random threshold (if I need one). Add that feature to the tree, and then I test my tree with my
dataset and compute a score. I’ll do this 2000 times and keep the binary feature that
produced the best tree with the best score. I then keep growing my tree in a greedy fashion
until it’s big enough (5-9 levels deep) or accurate enough.
-This answers the question where does x, y, T come from.
-In my experience a good sampling of 500-2000 works really well with diminishing returns
with anything higher.
-This is the most time consuming part of building these times, but it’s extremely
parallelizable.
Is Grass Wet?
YES NO
Do you like oranges?
YES NO
Selecting Good Questions
-Now that’s for classification. Decision trees can also be used for regression too.
-Instead of classes like yes/no, cat/dog/horse, etc. The output is the average value at the
leaves from my dataset.
-What makes a good question? The ones that decrease the variance from the averages.
-Also note, the output can be multi-dimensional, and not necessarily a single value. You can
compute variance of multi-dimensional things fairly easily, don’t worry.
I[5] < I[11]
YES
YES
NO
NO
I[7] < I[3]
-So here is a binary feature tree that returns a value (like probability it’s an object) instead of
a class... or it could be a vector like landmarks.
-Now we can start constructing interesting solutions using these concepts.
Corner Detector
-First let’s start with corner detection.
Harris Corner Detector
1. Compute a smooth gradient in the X and Y
2. For each pixel, compute this matrix.
3. Solve for R
4. Maximum suppression to gather corners.
-Harris Corner Detector, one of the simplest ways to detect corners based on estimating the
2nd derivative of the sum-square-distance of two patches.
-SURF, SIFT, SUSAN etc.
-So what’s the point? These points are stable regardless of angle, scale, or translation.
-This reduces the data such that you can rapidly compare the image to a template for
techniques like augmented reality, image stitching, and motion tracking.
-So you can find corners using these four easy steps... wait... lots of math... slow...
FAST Corner Detector
Given a pixel, based on the 16 surrounding pixel, is this location a corner?
FAST uses a decision tree trained on real images and converted to nested if
statements in C.
Doesn’t use math, averages about 3 comparisons per pixel...very very FAST.
http://mi.eng.cam.ac.uk/~er258/work/fast.html
-Ok, enough of that. Let’s use a more machine learning approach...
FAST: Features from Accelerated Segment Test
FAST Corner Detector
The source code is computer generated,
and free for anyone to use.
It is 6000 lines long and not
comprehensible.
With an averaging of vectors and an
arctangent, you can get a rotation vector
cheaply.IPLE TARGET LOCALISATION AT OVER 100 FPS
d for the HIPs and the 5 sample locations selected
est point (shown by the grey circle). Right: The
m of the gradients between opposite pixels in the
e Positions and Orientations
us to select FAST-9 [12] as the interest point de-
ientation require computationally expensive blur-
http://mi.eng.cam.ac.uk/~er258/work/fast.html
FAST Example
-Here’s a picture of your’s truly and a Starbuck’s Logo that I ran for a project.
-The lines indicate a direction derived from that rotation vector in the last slide. It’s useful for
normalizing patches like if you were to create an augmented reality system on a mobile
device.
-Here is some random dude’s youtube video running FAST. I’d show you my own, but I didn’t
have enough time.
-Notice it’s running in realtime off a slow iPhone 3, Harris Corners and SURF would drag on
such a device. Just as a note, Mobile phones typical run 10x-30x slower than desktops.
FAST Example
-Here’s a picture of your’s truly and a Starbuck’s Logo that I ran for a project.
-The lines indicate a direction derived from that rotation vector in the last slide. It’s useful for
normalizing patches like if you were to create an augmented reality system on a mobile
device.
-Here is some random dude’s youtube video running FAST. I’d show you my own, but I didn’t
have enough time.
-Notice it’s running in realtime off a slow iPhone 3, Harris Corners and SURF would drag on
such a device. Just as a note, Mobile phones typical run 10x-30x slower than desktops.
Keypoint Recognition
-Once you have corners, the next step is to identify what those corners belong to.
Keypoint Recognition
Fast Keypoint Recognition using Random Ferns
Mustafa Özuysal, Michael Calonder, Vincent Lepetit and Pascal Fua
-So in an image stitching problem, an augmented reality solution, or bag-of-words object
recognizer (Amazon’s Product IDer thingy), you sample a region of interest around each
corner and try to match it with a known template.
-Comparisons are often non-trivial because you have to normalize the patches from
distortions caused by rotations and tilt, normalize the brightness, and then come up with
some feature vector from the patches.
-Finally you measure the distances from the feature vectors from each patch in the template
to the image.. That’s like an O(n^2) deal there.
-Everything about this sounds really slow on an iPhone.
-Ok, let’s use binary feature trees to solve this.
Fast Keypoint Recognition using Random Ferns
Mustafa Özuysal, Michael Calonder, Vincent Lepetit and Pascal Fua
-First generate patches from each corner in the original template with random orientations,
sizes, tilt. Generate a ton of them because that’s our training set.
Fast Keypoint Recognition using Random Ferns
Mustafa Özuysal, Michael Calonder, Vincent Lepetit and Pascal Fua
-Next, at for these guys, they simplified that decision tree concept with something they
dubbed Ferns (or primitive trees)
-The idea is if you ask the same question at each depth, you can collapse the tree into simple
bits in an index. The leaves are simply locations in an array.
-So for example three bits is 2^3 or 8 possible outcomes. So instead of a tree, you have an
array of 8 probability histograms.
-Next, the selection of classes is based off this simple max of the class probabilities for a
given set of bits, but you’re probably going to need a lot of bits to get a good result (they
empirically determine this)
-Now if you assume independence of the features, then you can reduce this to products of
several ferns.
0
1
1
1
0
0
1
0
1
1102=6 0012=1 1012=5
Efficient Keypoint Recognition, Lepetit et al
1
0
0
1
0
1
0
1
0
0012=6 1012=5 0102=2
Efficient Keypoint Recognition, Lepetit et al
1
0
0
0
1
1
1
0
1
0012=6 1102=6 1012=5
Efficient Keypoint Recognition, Lepetit et al
Fast Keypoint Recognition in Ten Lines of Code
Mustafa Özuysal Pascal Fua Vincent Lepetit
-This whole algorithm can be express in just 10 lines of C code.
-Very very fast.
Fast Keypoint Recognition in Ten Lines of Code
Mustafa Özuysal Pascal Fua Vincent Lepetit
-This whole algorithm can be express in just 10 lines of C code.
-Very very fast.
From Bits to Images
-So these binary trees toss all gray values. Do they really characterize images well enough to
solve serious problems?
-Ok, let’s say we took an image, found corners, sampled binary pairs from 32x32 patches
(few hundred). Can we reconstruct an image from just the locations of the corners, patch
size, and binary pairs?
From Bits to Images: Inversion of Local Binary Descriptors
Emmanuel d’Angelo, Laurent Jacques, Alexandre Alahi and Pierre Vandergheynst
-Yes we can. It’s a bit like solving Sodoku.
-What’s really surprising is how much information we can capture without any gray levels.
-So you’re collecting edge information over different scales, plus, if it’s just simple
comparisons, it’s immune to brightness / contrast issues or global lighting.
-In many ways it’s superior to other means of characterizing images.
Object Detection
-Let’s talk about object detection.
Viola / Jones Object Detection
"Robust Real-time Object Detection"
Paul Viola and Michael Jones
-The Viola Jone’s object detection frame was formulated in the early 2000s and was a
breakthru in object detection. Cheap cameras and cellphones use it all the time.
-It works by measuring the differences of the sums of rectangles and taking a threshold. If it
exceeds a certain value, it’s a face.
-Now of course that’s a very poor system of face detection, so they strengthened it utilizing
the principles of ensemble learning.
-That is, yes one rectangle comparison makes a very awful face detector, but if you have a
large number of independent detectors and do a weighted vote, you’ll end up with a much
more accurate detector.
-Wisdom of crowds.
-The AdaBoost algorithm shown here lists a method of determining the weighting. Basically
give higher vote to the more accurate detectors, retrain on the dataset looking at the
incorrect samples. Repeat.
Viola / Jones Object Detection
Figure 2: The integral image. Left: A simple input of image values. Center: The computed integral image. Right:
Using the integral image to calculate the sum over rectangle D.
3 The Technique
Our adaptive thresholding technique is a simple extension of Wellner’s method [Wellner 1993]. The main idea
in Wellner’s algorithm is that each pixel is compared to an average of the surrounding pixels. Specifically, an
approximate moving average of the last s pixels seen is calculated while traversing the image. If the value of the
current pixel is t percent lower than the average then it is set to black, otherwise it is set to white. This method works
because comparing a pixel to the average of nearby pixels will preserve hard contrast lines and ignore soft gradient
changes. The advantage of this method is that only a single pass through the image is required. Wellner uses 1/8th
of the image width for the value of s and 15 for the value of t. However, a problem with this method is that it is
dependent on the scanning order of the pixels. In addition, the moving average is not a good representation of the
surrounding pixels at each step because the neighbourhood samples are not evenly distributed in all directions. By
using the integral image (and sacrificing one additional iteration through the image), we present a solution that does
not suffer from these problems. Our technique is clean, straightforward, easy to code, and produces the same output
independently of how the image is processed. Instead of computing a running average of the last s pixels seen, we
compute the average of an s x s window of pixels centered around each pixel. This is a better average for comparison
since it considers neighbouring pixels on all sides. The average computation is accomplished in linear time by using
the integral image. We calculate the integral image in the first pass through the input image. In a second pass, we
compute the s x s average using the integral image for each pixel in constant time and then perform the comparison.
If the value of the current pixel is t percent less than this average then it is set to black, otherwise it is set to white.
The following pseudocode demonstrates our technique for input image in, output binary image out, image width w
and image height h.
procedure AdaptiveThreshold(in,out,w,h)
1: for i = 0 to w do
2: sum ⇥ 0
3: for j = 0 to h do
4: sum ⇥ sum+in[i, j]
5: if i = 0 then
we can use an integral image and achieve a constant number of operations per rectangle with
preprocessing.
e the integral image, we store at each location, I(x,y), the sum of all f(x,y) terms to the lef
,y). This is accomplished in linear time using the following equation for each pixel (taking
cases),
I(x,y) = f(x,y)+I(x 1,y)+I(x,y 1) I(x 1,y 1).
ft and center) illustrates the computation of an integral image. Once we have the integral ima
tion for any rectangle with upper left corner (x1,y1), and lower right corner (x2,y2) can be c
me using the following equation,
x2
Â
x=x1
y2
Â
y=y1
f(x,y) = I(x2,y2) I(x2,y1 1) I(x1 1,y2)+I(x1 1,y1 1).
ght) illustrates that computing the sum of f(x,y) over the rectangle D using Equation 2 is e
the sums over the rectangles (A+B+C+D)-(A+B)-(A+C)+A.
D. Bradley, G. Roth, Adaptive Thresholding using the
Integral Image. J. Graphics Tools 12(2): 13-21 (2007)
-The other trick in Viola-Jones was the fast method of summing the rectangles using an
integral image.
-If you construct an integral image based on summing the pixels left and about while
subtracting the upper left pixel, you can rapidly compute the rect sum using the about
equation.
-Problem is this construction of integral images can be slow, plus you’re doing 8 operations
per feature.
-Binary Features with pixel comparisons can do it with two without even constructing an
integral image or brightness / contrast normalization.
Binary Feature-Based Object Detection
Unconstrained Face Detection
Shengcai Liao, Anil K. Jain, and Stan Z. Li
I[5] < I[11]
Y N Y N
Y N
YES
YES
NO
NO
I[7] < I[3]
Object Detection with Pixel Intensity Comparisons Organized in Decision Trees
Nenad Markus, Miroslav Frljak, Igor S. Pandzic, Jorgen Ahlberg, and Robert Forchheimer
-This technique was simultaneously published by several groups.
-Here is Nenad Markus’ implementation
-His runs 30x faster than Viola Jones and 9x faster than Local Binary Patterns approach in
OpenCV.
-Here he accomplishes rotational invariance by rotating the trees N times, however it’s fast
enough that that’s feasible.
Binary Feature-Based Object Detection
Unconstrained Face Detection
Shengcai Liao, Anil K. Jain, and Stan Z. Li
I[5] < I[11]
Y N Y N
Y N
YES
YES
NO
NO
I[7] < I[3]
Object Detection with Pixel Intensity Comparisons Organized in Decision Trees
Nenad Markus, Miroslav Frljak, Igor S. Pandzic, Jorgen Ahlberg, and Robert Forchheimer
-This technique was simultaneously published by several groups.
-Here is Nenad Markus’ implementation
-His runs 30x faster than Viola Jones and 9x faster than Local Binary Patterns approach in
OpenCV.
-Here he accomplishes rotational invariance by rotating the trees N times, however it’s fast
enough that that’s feasible.
Object Landmarking
Face Alignment by Explicit Shape Regression, Cao et al
-Microsoft has been putting a lot of effort into deriving methods for landmarking faces.
-For some reason they call it facial alignment. We tend to call it landmarking or
segmentation.
-Basically find points on an object that may or may not represent contours of that object.
Base on: Face Alignment by Explicit Shape Regression, Cao et al
t = 0 t = 1 t = 2 t = 10
Affine
Transform to
mean shape
Transform
back from
mean shape
...
...
Insert Magic
…... …...
-Here is one of their approaches to landmarking faces using regression trees.
-Dubbed Explicit Shape Regression.
-Typically done with 10 groups of trees.
-Each group is hundreds of trees refining the shape vector from the previous tree.
-Although they don’t say it, they’re effectively using a Gradient Boosting approach using
regression trees with a lambda of one. A slightly lower lambda would improve generalization,
but most likely they were not aware of this.
Face Alignment by Explicit Shape Regression, Cao et al
I[S5+∆] < I[S11+∆]
YES
YES
NO
NO
I[S7+∆] < I[S3+∆]
What’s inside ?
-So each regression tree is between 5-9 levels deep.
-Pixel comparisons are made with locations relative to the landmarks, S.
-One comparison requires which two landmarks (i,j) and x/y delta from each landmark.
-The affine to mean transform in the other slide removes any need to care about scale.
-The leaves store delta S’s to move the S closer to the target.
Face Alignment by Explicit Shape Regression, Cao et al
-An average face, S^0, is placed on the image using a face detector like Viola-Jones / LBP /
or that tree thing I just talked about.
-The shape is refined to the image using groups of trees followed by affine transform
adjustments.
-Here are examples of landmarked faces.
-The original paper makes the argument that all generated landmarks are based on a linear
combination of faces. That it implicitly creates a shape model of faces, so you don’t need to
worry about generating non-sensical faces.
In Conclusion
I just presented a small subset of a very large topic.
The comparison of two pixels is a surprisingly useful
feature that’s very easy to compute.
Combined with decision trees and ferns, these
techniques substitute math with machine learning.
This enables complicated object recognition
techniques to run in realtime on mobile devices.

More Related Content

PDF
Tres sombreros de copa - Miguel Mihura
Lorenzo Rico Lazaro
 
PDF
Offset
Camila Yoshida
 
PDF
Tipos de impressão
Bruno Henrique Ferreira
 
PDF
Binary Semantics Limited Corporate Overview
Binary Semantics
 
PDF
Retrieval and Ranking of Biomedical Images using Boosted Haar Features
Melanie Torres Bisbal
 
PPT
Oracle Crystall ball
Binary Semantics
 
PPTX
Teaching english using technological resources in classrooms
Intellectual Look
 
PDF
Binary code-based Human Detection
MPRG_Chubu_University
 
Tres sombreros de copa - Miguel Mihura
Lorenzo Rico Lazaro
 
Tipos de impressão
Bruno Henrique Ferreira
 
Binary Semantics Limited Corporate Overview
Binary Semantics
 
Retrieval and Ranking of Biomedical Images using Boosted Haar Features
Melanie Torres Bisbal
 
Oracle Crystall ball
Binary Semantics
 
Teaching english using technological resources in classrooms
Intellectual Look
 
Binary code-based Human Detection
MPRG_Chubu_University
 

Viewers also liked (16)

PPTX
Plickers getting started
Kenny Pieper
 
PPTX
Using websites
Achmad Badawi
 
DOCX
Coponential analysis
Farooq Niazi
 
PPT
Harris Method
Sheenum
 
PPTX
Plickers
Nadine Gilkison
 
PPTX
Using websites in the classroom
Hardi Prasetyo
 
PPTX
Componential analysis approach
hamid gittan
 
PDF
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Yusuke Uchida
 
PPTX
Componential analysis and semantic decomposition
Ratna Nurhidayati
 
PPTX
Linguistic Features & Functions
Masitah ZulkifLy
 
PPTX
Object detection
Somesh Vyas
 
PPTX
Kahoot socrative plickers
Luísa Lima
 
PDF
Intro To MongoDB
Alex Sharp
 
PPT
Linguistic Devices
wendron
 
PDF
harris corner detector
Mohamed Khomsi
 
PPT
Best Practices for Teaching English to Young Learners by Joan Shin
Venezuela TESOL
 
Plickers getting started
Kenny Pieper
 
Using websites
Achmad Badawi
 
Coponential analysis
Farooq Niazi
 
Harris Method
Sheenum
 
Plickers
Nadine Gilkison
 
Using websites in the classroom
Hardi Prasetyo
 
Componential analysis approach
hamid gittan
 
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Yusuke Uchida
 
Componential analysis and semantic decomposition
Ratna Nurhidayati
 
Linguistic Features & Functions
Masitah ZulkifLy
 
Object detection
Somesh Vyas
 
Kahoot socrative plickers
Luísa Lima
 
Intro To MongoDB
Alex Sharp
 
Linguistic Devices
wendron
 
harris corner detector
Mohamed Khomsi
 
Best Practices for Teaching English to Young Learners by Joan Shin
Venezuela TESOL
 
Ad

Similar to Binary Features for Object Detection and Landmarking (20)

PDF
Lecture 02 internet video search
zukun
 
PPT
Im seg04
Sonali Gupta
 
PPT
ImSeg04 (2).ppt
Dhaval Bhojani
 
PPT
ImSeg04.ppt
ssuser1cc06c1
 
PPT
CBIR_white.ppt
Dr. Naushad Varish
 
PDF
Computer Vision Introduction and Basic OpenCV.pdf
NikulZinzuvadiya
 
PPTX
Cahall Final Intern Presentation
Daniel Cahall
 
PDF
L5. Data Transformation and Feature Engineering
Machine Learning Valencia
 
PDF
Introduction to Computer Vision (uapycon 2017)
Anton Kasyanov
 
PDF
Introduction to Computer Vision
Componica LLC
 
PPTX
Image feature extraction
Rushin Shah
 
PDF
bstract Point processing uses only the information in individual pixels to pr...
NALESVPMEngg
 
PDF
2. IP Fundamentals.pdf
DrNVaijayanthiDeanRD
 
PPTX
Dip day1&2
nakarthik91
 
PDF
Machine learning for high-speed corner detection
butest
 
PDF
thesis
Nitha Thomas
 
PDF
Data Science - Part XVII - Deep Learning & Image Processing
Derek Kane
 
PDF
large_scale_search.pdf
Emerald72
 
PPTX
Digital Image Processing
PreethiPreethi28
 
PDF
Week06 bme429-cbir
Ikram Moalla
 
Lecture 02 internet video search
zukun
 
Im seg04
Sonali Gupta
 
ImSeg04 (2).ppt
Dhaval Bhojani
 
ImSeg04.ppt
ssuser1cc06c1
 
CBIR_white.ppt
Dr. Naushad Varish
 
Computer Vision Introduction and Basic OpenCV.pdf
NikulZinzuvadiya
 
Cahall Final Intern Presentation
Daniel Cahall
 
L5. Data Transformation and Feature Engineering
Machine Learning Valencia
 
Introduction to Computer Vision (uapycon 2017)
Anton Kasyanov
 
Introduction to Computer Vision
Componica LLC
 
Image feature extraction
Rushin Shah
 
bstract Point processing uses only the information in individual pixels to pr...
NALESVPMEngg
 
2. IP Fundamentals.pdf
DrNVaijayanthiDeanRD
 
Dip day1&2
nakarthik91
 
Machine learning for high-speed corner detection
butest
 
thesis
Nitha Thomas
 
Data Science - Part XVII - Deep Learning & Image Processing
Derek Kane
 
large_scale_search.pdf
Emerald72
 
Digital Image Processing
PreethiPreethi28
 
Week06 bme429-cbir
Ikram Moalla
 
Ad

Recently uploaded (20)

PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
Zero carbon Building Design Guidelines V4
BassemOsman1
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
PDF
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
PPTX
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
PDF
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
PDF
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
PPTX
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
PPT
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
PDF
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
PDF
JUAL EFIX C5 IMU GNSS GEODETIC PERFECT BASE OR ROVER
Budi Minds
 
PPTX
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
PPT
SCOPE_~1- technology of green house and poyhouse
bala464780
 
PPTX
Inventory management chapter in automation and robotics.
atisht0104
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PDF
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
PDF
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
PPTX
easa module 3 funtamental electronics.pptx
tryanothert7
 
PDF
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
PPT
Ppt for engineering students application on field effect
lakshmi.ec
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Zero carbon Building Design Guidelines V4
BassemOsman1
 
Information Retrieval and Extraction - Module 7
premSankar19
 
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
JUAL EFIX C5 IMU GNSS GEODETIC PERFECT BASE OR ROVER
Budi Minds
 
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
SCOPE_~1- technology of green house and poyhouse
bala464780
 
Inventory management chapter in automation and robotics.
atisht0104
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
easa module 3 funtamental electronics.pptx
tryanothert7
 
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
Ppt for engineering students application on field effect
lakshmi.ec
 

Binary Features for Object Detection and Landmarking

  • 1. Binary Features Steven C. Mitchell, Ph.D. Componica, LLC
  • 2. What’s a Binary Feature?
  • 3. What’s a Binary Feature? -Let’s take an image, and sample a region of interest, a 4x4 patch. Maybe you’re looking for a face, or a tumor, or gun. -In a typical object detection system, this region of interest will be scanned across the image over different scales. -Typically you scan left-to-right, top-to-bottom in steps of 10% the size of the patch. Then you shrink the image (or scale the patch) by 20% and start over. Continue doing that until the image becomes too small or you found what you’re looking for.
  • 4. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 5. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 6. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 7. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 8. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 9. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 10. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 11. -Ok, let’s try different patches with the same binary feature, that is compare location 5 and 11. -Now imagine I try a whole bunch on pairs on a given patch. 2 vs 14, 8 vs 4, 7 vs 2, etc. I’m going to get a bunch of yes/no responses base on the patch I happen to show the system.
  • 12. -Ok, let’s try different patches with the same binary feature, that is compare location 5 and 11. -Now imagine I try a whole bunch on pairs on a given patch. 2 vs 14, 8 vs 4, 7 vs 2, etc. I’m going to get a bunch of yes/no responses base on the patch I happen to show the system.
  • 13. -Ok, let’s try different patches with the same binary feature, that is compare location 5 and 11. -Now imagine I try a whole bunch on pairs on a given patch. 2 vs 14, 8 vs 4, 7 vs 2, etc. I’m going to get a bunch of yes/no responses base on the patch I happen to show the system.
  • 14. Different Types of Binary Features -Of course there are many different types of binary features, different types of questions I can ask. -Simple thresholding, which pixel is brighter, which pixel is brighter based on a threshold, how similar are two pixels. -With color it could be comparisons of different channels. -The main points are, each feature has a fixed set of parameters discovered during training and fixed for recognition. And the output is a yes or no. -BTW, I really like the simple comparison of two pixels. It fast and any changes to the brightness / contrast of a patch will always return the same result.
  • 15. Decision Tree Overview -Now in order to make use of these features, let’s talk about decision trees.
  • 16. Is Grass Wet? Did you water the grass? Y N Y N Y N YES YES NO NO Did it rain last night? -Let’s saying you’re trying to determine if it rained last night. -This is a classification problem. -Here I constructed a simple decision tree based on a couple yes/no questions. -At the leaves of the this tree are probability histograms created from my data. -They sum up to one. -My decision is based on which of the two bars are greater at each leaf.
  • 17. Is Grass Wet? YES NO Y NY N Do you like oranges? YES NO Y N Y N Selecting Good Questions -So how do I pick a good question? First pick a question from my Universe of questions, pour my data thru it, and measure how well it predicts. -Three commonly used metrics: Entropy, Gini Impurity, and Classification Error. -What they basically measure is how far away you are from just a 50/50 coin toss. -Here you can see an irrelevant question like “Do you like oranges” would yield a flat distribution. This would yield a high entropy, gini impurity, or classification error.
  • 18. I[5] < I[11] Y N Y N Y N YES YES NO NO I[7] < I[3] -Going back to Binary Features, the questions we ask are based on pixel comparisons. -How do we pick the parameters? Well we random sample from the universe of parameters and choose the one that yields a good score from the given dataset. -In the 4x4 patch, I would pick two random numbers from 0 to 15 (no duplicates) and a random threshold (if I need one). Add that feature to the tree, and then I test my tree with my dataset and compute a score. I’ll do this 2000 times and keep the binary feature that produced the best tree with the best score. I then keep growing my tree in a greedy fashion until it’s big enough (5-9 levels deep) or accurate enough. -This answers the question where does x, y, T come from. -In my experience a good sampling of 500-2000 works really well with diminishing returns with anything higher. -This is the most time consuming part of building these times, but it’s extremely parallelizable.
  • 19. Is Grass Wet? YES NO Do you like oranges? YES NO Selecting Good Questions -Now that’s for classification. Decision trees can also be used for regression too. -Instead of classes like yes/no, cat/dog/horse, etc. The output is the average value at the leaves from my dataset. -What makes a good question? The ones that decrease the variance from the averages. -Also note, the output can be multi-dimensional, and not necessarily a single value. You can compute variance of multi-dimensional things fairly easily, don’t worry.
  • 20. I[5] < I[11] YES YES NO NO I[7] < I[3] -So here is a binary feature tree that returns a value (like probability it’s an object) instead of a class... or it could be a vector like landmarks. -Now we can start constructing interesting solutions using these concepts.
  • 21. Corner Detector -First let’s start with corner detection.
  • 22. Harris Corner Detector 1. Compute a smooth gradient in the X and Y 2. For each pixel, compute this matrix. 3. Solve for R 4. Maximum suppression to gather corners. -Harris Corner Detector, one of the simplest ways to detect corners based on estimating the 2nd derivative of the sum-square-distance of two patches. -SURF, SIFT, SUSAN etc. -So what’s the point? These points are stable regardless of angle, scale, or translation. -This reduces the data such that you can rapidly compare the image to a template for techniques like augmented reality, image stitching, and motion tracking. -So you can find corners using these four easy steps... wait... lots of math... slow...
  • 23. FAST Corner Detector Given a pixel, based on the 16 surrounding pixel, is this location a corner? FAST uses a decision tree trained on real images and converted to nested if statements in C. Doesn’t use math, averages about 3 comparisons per pixel...very very FAST. http://mi.eng.cam.ac.uk/~er258/work/fast.html -Ok, enough of that. Let’s use a more machine learning approach... FAST: Features from Accelerated Segment Test
  • 24. FAST Corner Detector The source code is computer generated, and free for anyone to use. It is 6000 lines long and not comprehensible. With an averaging of vectors and an arctangent, you can get a rotation vector cheaply.IPLE TARGET LOCALISATION AT OVER 100 FPS d for the HIPs and the 5 sample locations selected est point (shown by the grey circle). Right: The m of the gradients between opposite pixels in the e Positions and Orientations us to select FAST-9 [12] as the interest point de- ientation require computationally expensive blur- http://mi.eng.cam.ac.uk/~er258/work/fast.html
  • 25. FAST Example -Here’s a picture of your’s truly and a Starbuck’s Logo that I ran for a project. -The lines indicate a direction derived from that rotation vector in the last slide. It’s useful for normalizing patches like if you were to create an augmented reality system on a mobile device. -Here is some random dude’s youtube video running FAST. I’d show you my own, but I didn’t have enough time. -Notice it’s running in realtime off a slow iPhone 3, Harris Corners and SURF would drag on such a device. Just as a note, Mobile phones typical run 10x-30x slower than desktops.
  • 26. FAST Example -Here’s a picture of your’s truly and a Starbuck’s Logo that I ran for a project. -The lines indicate a direction derived from that rotation vector in the last slide. It’s useful for normalizing patches like if you were to create an augmented reality system on a mobile device. -Here is some random dude’s youtube video running FAST. I’d show you my own, but I didn’t have enough time. -Notice it’s running in realtime off a slow iPhone 3, Harris Corners and SURF would drag on such a device. Just as a note, Mobile phones typical run 10x-30x slower than desktops.
  • 27. Keypoint Recognition -Once you have corners, the next step is to identify what those corners belong to.
  • 28. Keypoint Recognition Fast Keypoint Recognition using Random Ferns Mustafa Özuysal, Michael Calonder, Vincent Lepetit and Pascal Fua -So in an image stitching problem, an augmented reality solution, or bag-of-words object recognizer (Amazon’s Product IDer thingy), you sample a region of interest around each corner and try to match it with a known template. -Comparisons are often non-trivial because you have to normalize the patches from distortions caused by rotations and tilt, normalize the brightness, and then come up with some feature vector from the patches. -Finally you measure the distances from the feature vectors from each patch in the template to the image.. That’s like an O(n^2) deal there. -Everything about this sounds really slow on an iPhone. -Ok, let’s use binary feature trees to solve this.
  • 29. Fast Keypoint Recognition using Random Ferns Mustafa Özuysal, Michael Calonder, Vincent Lepetit and Pascal Fua -First generate patches from each corner in the original template with random orientations, sizes, tilt. Generate a ton of them because that’s our training set.
  • 30. Fast Keypoint Recognition using Random Ferns Mustafa Özuysal, Michael Calonder, Vincent Lepetit and Pascal Fua -Next, at for these guys, they simplified that decision tree concept with something they dubbed Ferns (or primitive trees) -The idea is if you ask the same question at each depth, you can collapse the tree into simple bits in an index. The leaves are simply locations in an array. -So for example three bits is 2^3 or 8 possible outcomes. So instead of a tree, you have an array of 8 probability histograms. -Next, the selection of classes is based off this simple max of the class probabilities for a given set of bits, but you’re probably going to need a lot of bits to get a good result (they empirically determine this) -Now if you assume independence of the features, then you can reduce this to products of several ferns.
  • 31. 0 1 1 1 0 0 1 0 1 1102=6 0012=1 1012=5 Efficient Keypoint Recognition, Lepetit et al
  • 32. 1 0 0 1 0 1 0 1 0 0012=6 1012=5 0102=2 Efficient Keypoint Recognition, Lepetit et al
  • 33. 1 0 0 0 1 1 1 0 1 0012=6 1102=6 1012=5 Efficient Keypoint Recognition, Lepetit et al
  • 34. Fast Keypoint Recognition in Ten Lines of Code Mustafa Özuysal Pascal Fua Vincent Lepetit -This whole algorithm can be express in just 10 lines of C code. -Very very fast.
  • 35. Fast Keypoint Recognition in Ten Lines of Code Mustafa Özuysal Pascal Fua Vincent Lepetit -This whole algorithm can be express in just 10 lines of C code. -Very very fast.
  • 36. From Bits to Images -So these binary trees toss all gray values. Do they really characterize images well enough to solve serious problems? -Ok, let’s say we took an image, found corners, sampled binary pairs from 32x32 patches (few hundred). Can we reconstruct an image from just the locations of the corners, patch size, and binary pairs?
  • 37. From Bits to Images: Inversion of Local Binary Descriptors Emmanuel d’Angelo, Laurent Jacques, Alexandre Alahi and Pierre Vandergheynst -Yes we can. It’s a bit like solving Sodoku. -What’s really surprising is how much information we can capture without any gray levels. -So you’re collecting edge information over different scales, plus, if it’s just simple comparisons, it’s immune to brightness / contrast issues or global lighting. -In many ways it’s superior to other means of characterizing images.
  • 38. Object Detection -Let’s talk about object detection.
  • 39. Viola / Jones Object Detection "Robust Real-time Object Detection" Paul Viola and Michael Jones -The Viola Jone’s object detection frame was formulated in the early 2000s and was a breakthru in object detection. Cheap cameras and cellphones use it all the time. -It works by measuring the differences of the sums of rectangles and taking a threshold. If it exceeds a certain value, it’s a face. -Now of course that’s a very poor system of face detection, so they strengthened it utilizing the principles of ensemble learning. -That is, yes one rectangle comparison makes a very awful face detector, but if you have a large number of independent detectors and do a weighted vote, you’ll end up with a much more accurate detector. -Wisdom of crowds. -The AdaBoost algorithm shown here lists a method of determining the weighting. Basically give higher vote to the more accurate detectors, retrain on the dataset looking at the incorrect samples. Repeat.
  • 40. Viola / Jones Object Detection Figure 2: The integral image. Left: A simple input of image values. Center: The computed integral image. Right: Using the integral image to calculate the sum over rectangle D. 3 The Technique Our adaptive thresholding technique is a simple extension of Wellner’s method [Wellner 1993]. The main idea in Wellner’s algorithm is that each pixel is compared to an average of the surrounding pixels. Specifically, an approximate moving average of the last s pixels seen is calculated while traversing the image. If the value of the current pixel is t percent lower than the average then it is set to black, otherwise it is set to white. This method works because comparing a pixel to the average of nearby pixels will preserve hard contrast lines and ignore soft gradient changes. The advantage of this method is that only a single pass through the image is required. Wellner uses 1/8th of the image width for the value of s and 15 for the value of t. However, a problem with this method is that it is dependent on the scanning order of the pixels. In addition, the moving average is not a good representation of the surrounding pixels at each step because the neighbourhood samples are not evenly distributed in all directions. By using the integral image (and sacrificing one additional iteration through the image), we present a solution that does not suffer from these problems. Our technique is clean, straightforward, easy to code, and produces the same output independently of how the image is processed. Instead of computing a running average of the last s pixels seen, we compute the average of an s x s window of pixels centered around each pixel. This is a better average for comparison since it considers neighbouring pixels on all sides. The average computation is accomplished in linear time by using the integral image. We calculate the integral image in the first pass through the input image. In a second pass, we compute the s x s average using the integral image for each pixel in constant time and then perform the comparison. If the value of the current pixel is t percent less than this average then it is set to black, otherwise it is set to white. The following pseudocode demonstrates our technique for input image in, output binary image out, image width w and image height h. procedure AdaptiveThreshold(in,out,w,h) 1: for i = 0 to w do 2: sum ⇥ 0 3: for j = 0 to h do 4: sum ⇥ sum+in[i, j] 5: if i = 0 then we can use an integral image and achieve a constant number of operations per rectangle with preprocessing. e the integral image, we store at each location, I(x,y), the sum of all f(x,y) terms to the lef ,y). This is accomplished in linear time using the following equation for each pixel (taking cases), I(x,y) = f(x,y)+I(x 1,y)+I(x,y 1) I(x 1,y 1). ft and center) illustrates the computation of an integral image. Once we have the integral ima tion for any rectangle with upper left corner (x1,y1), and lower right corner (x2,y2) can be c me using the following equation, x2 Â x=x1 y2 Â y=y1 f(x,y) = I(x2,y2) I(x2,y1 1) I(x1 1,y2)+I(x1 1,y1 1). ght) illustrates that computing the sum of f(x,y) over the rectangle D using Equation 2 is e the sums over the rectangles (A+B+C+D)-(A+B)-(A+C)+A. D. Bradley, G. Roth, Adaptive Thresholding using the Integral Image. J. Graphics Tools 12(2): 13-21 (2007) -The other trick in Viola-Jones was the fast method of summing the rectangles using an integral image. -If you construct an integral image based on summing the pixels left and about while subtracting the upper left pixel, you can rapidly compute the rect sum using the about equation. -Problem is this construction of integral images can be slow, plus you’re doing 8 operations per feature. -Binary Features with pixel comparisons can do it with two without even constructing an integral image or brightness / contrast normalization.
  • 41. Binary Feature-Based Object Detection Unconstrained Face Detection Shengcai Liao, Anil K. Jain, and Stan Z. Li I[5] < I[11] Y N Y N Y N YES YES NO NO I[7] < I[3] Object Detection with Pixel Intensity Comparisons Organized in Decision Trees Nenad Markus, Miroslav Frljak, Igor S. Pandzic, Jorgen Ahlberg, and Robert Forchheimer -This technique was simultaneously published by several groups. -Here is Nenad Markus’ implementation -His runs 30x faster than Viola Jones and 9x faster than Local Binary Patterns approach in OpenCV. -Here he accomplishes rotational invariance by rotating the trees N times, however it’s fast enough that that’s feasible.
  • 42. Binary Feature-Based Object Detection Unconstrained Face Detection Shengcai Liao, Anil K. Jain, and Stan Z. Li I[5] < I[11] Y N Y N Y N YES YES NO NO I[7] < I[3] Object Detection with Pixel Intensity Comparisons Organized in Decision Trees Nenad Markus, Miroslav Frljak, Igor S. Pandzic, Jorgen Ahlberg, and Robert Forchheimer -This technique was simultaneously published by several groups. -Here is Nenad Markus’ implementation -His runs 30x faster than Viola Jones and 9x faster than Local Binary Patterns approach in OpenCV. -Here he accomplishes rotational invariance by rotating the trees N times, however it’s fast enough that that’s feasible.
  • 44. Face Alignment by Explicit Shape Regression, Cao et al -Microsoft has been putting a lot of effort into deriving methods for landmarking faces. -For some reason they call it facial alignment. We tend to call it landmarking or segmentation. -Basically find points on an object that may or may not represent contours of that object.
  • 45. Base on: Face Alignment by Explicit Shape Regression, Cao et al t = 0 t = 1 t = 2 t = 10 Affine Transform to mean shape Transform back from mean shape ... ... Insert Magic …... …... -Here is one of their approaches to landmarking faces using regression trees. -Dubbed Explicit Shape Regression. -Typically done with 10 groups of trees. -Each group is hundreds of trees refining the shape vector from the previous tree. -Although they don’t say it, they’re effectively using a Gradient Boosting approach using regression trees with a lambda of one. A slightly lower lambda would improve generalization, but most likely they were not aware of this.
  • 46. Face Alignment by Explicit Shape Regression, Cao et al I[S5+∆] < I[S11+∆] YES YES NO NO I[S7+∆] < I[S3+∆] What’s inside ? -So each regression tree is between 5-9 levels deep. -Pixel comparisons are made with locations relative to the landmarks, S. -One comparison requires which two landmarks (i,j) and x/y delta from each landmark. -The affine to mean transform in the other slide removes any need to care about scale. -The leaves store delta S’s to move the S closer to the target.
  • 47. Face Alignment by Explicit Shape Regression, Cao et al -An average face, S^0, is placed on the image using a face detector like Viola-Jones / LBP / or that tree thing I just talked about. -The shape is refined to the image using groups of trees followed by affine transform adjustments. -Here are examples of landmarked faces. -The original paper makes the argument that all generated landmarks are based on a linear combination of faces. That it implicitly creates a shape model of faces, so you don’t need to worry about generating non-sensical faces.
  • 48. In Conclusion I just presented a small subset of a very large topic. The comparison of two pixels is a surprisingly useful feature that’s very easy to compute. Combined with decision trees and ferns, these techniques substitute math with machine learning. This enables complicated object recognition techniques to run in realtime on mobile devices.