Darius Burschka 
Machine Vision and Perception Group (MVP) 
Department of Computer Science 
Technische Universität München
Machine Vision and Perception MVP Group @ TUM 
What is the information in a single 
image? 
• Reduction of dimensionality 
• Horizontal and vertical dimensions are scaled 
Distant objects are smaller 
with the distance to the object 
11/02/06 Vision, Darius Burschka
Computer Vision vs. Human Vision 
Machine Vision and Perception MVP Group @ TUM 
What is the correct 
Image information?
Machine Vision and Perception MVP Group @ TUM 
Illusions: What do they tell us about 
perception of humans? 
Geometry is perceived with strong prior assumptions
Machine Vision and Perception MVP Group @ TUM 
Illusions: what do they tell us about 
our brightness perception? 
Brightness is perceived not as direct measurement but under 
strong assumption about position and brightness of the light 
source
Machine Vision and Perception MVP Group @ TUM 
1. INTRODUCTION 
What does a single camera image tell 
us? 
reference image 
between a template Figure image representing 1.1: (a) A good the object match of between interest a (left) 
template image it turns out, that and the a beer scene bottle (right). is a sticker (b) However, put on it the turns surface 
out, that the
Machine Vision and Perception MVP Group @ TUM 
Camera as measurement device?
Machine Vision and Perception MVP Group @ TUM 
1.1 background 3 
Iconic images Segmented images Geometric representations Relational models 
Multiple People Tracking Motion Extraction Object Registration 
Object Recognition Semantic Map Object Modeling 
Activation Detection Scene Labeling Object Segmentation 
Intensity Color Depth 
Figure 1.2: Different level visual data structures as iconic image, segmented images, geomet-ric 
representations and relational models. Some vision application examples are 
Visual Data 
Matching 
Image courtesy of 
Wei Wang
Machine Vision and Perception MVP Group @ TUM 
Matching modalities in the images 
• Direct image content 
• Texture/Pattern 
• Color 
• Pre-processed image features 
Only image data 
• Lines 
• Corners 
• Keypoints + Descriptors (SIFT,SURF, FAST, AGAST) 
• Derived features 
With external data 
• Depth information 
• Homographies 
• Structural relations between images (e.g. plane tracking)
Machine Vision and Perception MVP Group @ TUM 
What can be tracked/matched? 
Color segmentation Pattern tracking Depth processing Image processing 
Dynamic Vision 
(XVision) 
algorithms
Machine Vision and Perception MVP Group @ TUM 
What can be tracked/matched? 
Dynamic Vision 
(XVision) 
algorithms 
applications
Machine Vision and Perception MVP Group @ TUM 
Color based Blob Tracking
Machine Vision and Perception MVP Group @ TUM 
ICRA 2001
Machine Vision and Perception MVP Group @ TUM 
Application Manipulation in 2D
Machine Vision and Perception MVP Group @ TUM 
Efficient Pattern Tracker (SSD) 
XVision supports fast and 
effective robust filtering to 
handle unexpected changes in 
i l l u m i n a t i o n and in the 
composition of the tracked 
target 
Current image 
Image 
Warping - 
Δp 
p 
Reference 
Weighting 
Model 
Inverse 
Σ
Machine Vision and Perception MVP Group @ TUM
B 
z 
the corresponding location u, v) on = 
the , plane in (2) 
the other 
image. In indoor environments many surfaces can approximated with planes E. 
Machine Vision and Perception MVP Group @ TUM 
Direct Navigation in 3D data 
(plane tracking ICRA 2003) 
with B describing the distance between the cameras 
of the stereo system [16]. 
We estimate the disparity D(u, v) of the plane E 
at an image point (u, v) using the unit focal length 
camera (f=1) projection to 
Ix refers image. We neglect series. The system Decomposition error term: e(u, ⎡ 
E : ax + by + cz = d In a stereo !system z̸= 0 : a 
with non-verged, unit focal 
x 
z 
+ b 
y 
z 
+ c = 
d 
z 
length (f=1) cameras the au + image bv + c = planes k · D(u, v) are (3) 
coplanar. 
x 
z 
y 
z 
d 
B 
this case, the disparity with u = 
, value v = 
, D(k= 
u, v) of a point (u,the image can be estimated from its depth z to 
The vector n = (a b c)T is normal to the plane E 
and describes the orientation of the plane relative to 
the camera. 
The equation (3) can be written in the form 
D(u, v) = 
B 
z 
, ⎛ 
⎞ 
⎛ 
⎞ 
⎛ 
⎞ 
ρ1 
u 
u 
with B describing D(u, v) the = 
⎝ 
distance ρ2 
⎠ · 
⎝ 
v 
between ⎠ = n∗ · 
⎝ 
v 
the ⎠ (4) 
cameras 
magnitude linearizing about image. series. Decomposition error ⎡ 
⎢⎢⎣ 
D(u, v) = 
B 
z 
, (2) 
with B describing the distance between the cameras 
of the stereo system [16]. 
We estimate the disparity D(u, v) of the plane E 
at an image point (u, v) using the unit focal length 
camera (f=1) projection to 
!z̸= 0 : a 
x 
z 
+ b 
y 
z 
+ c = 
d 
z 
au + bv + c = k · D(u, v) (3) 
with u = 
x 
z 
, v = 
y 
z 
, k= 
d 
B 
The vector n = (a b c)T is normal to the plane E 
and describes the orientation of the plane relative to 
the camera. 
The equation (3) can be written in the form 
D(u, v) = 
⎛ 
⎝ 
ρ1 
ρ2 
ρ3 
⎞ 
⎠ · 
⎛ 
⎝ 
u 
v 
1 
⎞ 
⎠ = n∗ · 
⎛ 
⎝ 
u 
v 
1 
⎞ 
⎠ (4) 
with ρ1 = 
a 
k 
, ρ2 = 
b 
k 
, ρ3 = 
c 
k 
This form uses modified parameters ρ1, ρ2, ρ3 of the 
plane E relating the image data u, v to D(u, v). 
magnitude for linearizing the expression about p. 
E(δp) ≈ 
& 
Here, ⎢⎢⎣ 
e(u1, v1) 
e(u1, v2) 
. . . . . . . . . 
e(um, vn) 
⎤ 
⎥⎥⎦ 
2.2.2 Mask Management 
Thus far, we have of a the approach to a mask into the weighting matrix pixel’s inclusion
Machine Vision and Perception MVP Group @ TUM 
Local Feature Tracking Algorithms 
• Image-gradient based à Extended KLT (ExtKLT) 
• patch-based implementation 
• feature propagation 
• corner-binding 
+ sub-pixel accuracy 
• algorithm scales bad with number 
of features 
• Tracking-By-Matching à AGAST tracker 
• AGAST corner detector 
• efficient descriptor 
• high frame-rates (hundrets of 
features in a few milliseconds) 
+ algorithm scales well with number 
of features 
• pixel-accuracy
Machine Vision and Perception MVP Group @ TUM 
Adaptive and Generic Accelerated Segment Test 
(AGAST) 
Improvements necessary for embedded processors: 
• full exploration of the configuration space by backward-induction (no learning) 
• binary decision tree (not ternary) 
• computation of the actual probability and processing costs 
(no greedy algorithm) 
• automatic scene adaption by tree switching (at no cost) 
• various corner pattern sizes (not just one) 
No drawbacks! 
Mair, Hager, Burschka, Suppa, Hirzinger 
ECCV, Springer, 2010 
E. Rosten
Machine Vision and Perception MVP Group @ TUM 
Vision-Based Navigation with Monocular 
Camera 
Vision-Based Control 
ICRA 2001 & 2003 
V-GPS: navigation part 
IROS 2003 
How can we construct the model on-the-fly?
Machine Vision and Perception MVP Group @ TUM 
Why not Image Jacobian?
Machine Vision and Perception MVP Group @ TUM 
What are we trying to solve? 
How to estimate the relative 
translation T and the rotation R 
between two camera positions 
⇒ Camera Ego-Motion 
Problem: monocular camera projection reduces 
the space by one dimension, therefore, 
external reference (a model) is 
necessary for 6DoF pose estimation 
→ SLAM
Machine Vision and Perception MVP Group @ TUM 
VSLAM system IROS 2003
Machine Vision and Perception MVP Group @ TUM 
Spherical Image Representation 
o Allows easy mapping on a variety of physical sensors 
o Avoids the angular limitations of a planar image 
o Better describes the physical imaging process
Machine Vision and Perception MVP Group @ TUM 
Work on Optimal Sensor Models
Machine Vision and Perception MVP Group @ TUM 
Omnidirectional Vision
Machine Vision and Perception MVP Group @ TUM 
Recursive Ego-Motion Estimation
Machine Vision and Perception MVP Group @ TUM 
Real Time Pose Tracking
Machine Vision and Perception MVP Group @ TUM 
How can we acquire the geometry?
Machine Vision and Perception MVP Group @ TUM 
Strobl, Mair, Bodenmüller, Kielhofer, Sepp, Suppa, Burschka, Hirzinger 
Feature Propagation 
ž Two motion prediction 
concepts 
— 2D feature propagation by 
motion derivatives 
— IMU-based feature 
prediction 
ž Combination of both: 
— translation propagation by 
feature velocity (2D) 
— rotation propagation by 
gyroscopes 
no feature propagation 
IROS, IEEE/RSJ, 2009, Best Paper Finalist 
Mair, Strobl, Bodenmüller, Suppa, Burschka 
KI, Springer Journal, 2010
Machine Vision and Perception MVP Group @ TUM 
Strobl, Mair, Bodenmüller, Kielhofer, Sepp, Suppa, Burschka, Hirzinger 
Feature Propagation 
ž Two motion prediction 
concepts 
— 2D feature propagation by 
motion derivatives 
— IMU-based feature 
prediction 
ž Combination of both: 
— translation propagation by 
feature velocity (2D) 
— rotation propagation by 
gyroscopes 
IROS, IEEE/RSJ, 2009, Best Paper Finalist 
Mair, Strobl, Bodenmüller, Suppa, Burschka 
linear feature propagation 
KI, Springer Journal, 2010
Machine Vision and Perception MVP Group @ TUM 
Feature Propagation 
ž Two motion prediction 
concepts 
— 2D feature propagation 
by motion derivatives 
— IMU-based feature 
prediction 
ž Combination of both: 
— translation propagation 
by feature velocity (2D) 
— rotation propagation by 
gyroscopes 
Strobl, Mair, Bodenmüller, Kielhofer, Sepp, Suppa, Burschka, Hirzinger 
IROS, IEEE/RSJ, 2009, Best Paper Finalist 
Mair, Strobl, Bodenmüller, Suppa, Burschka 
KI, Springer Journal, 2010 
linear + gyros based prop.
Z∞ – Algorithm at Work Mair, Burschka 
Machine Vision and Perception MVP Group @ TUM 
Mobile Robots Navigation, book chapter, In-Tech, 2010 
Simple sensors, low processing power 
Obstacle avoidance
Machine Vision and Perception MVP Group @ TUM
Machine Vision and Perception MVP Group @ TUM
Machine Vision and Perception MVP Group @ TUM 
Low cost Car Navigation with 
Embedded Systems Burschka,Mair RobotVision 2008
Machine Vision and Perception MVP Group @ TUM 
„Simple“ Image Acquisition 
60 images taken with a standard low cost digital camera
Machine Vision and Perception MVP Group @ TUM 
Estimation of the 6 Degrees of Freedom 
Estimation of 3 rotational angles Estimation of a translation vector
Machine Vision and Perception MVP Group @ TUM 
3D Reconstruction from the Images 
using Navigation Data (courtesy: H.Hirschmüller, DLR)
00325805, Collaborative Exploration - Vision Since we cannot rely on any extrinsic calibration, we perform of the extrinsic parameters directly from the current observation. find the transformation parameters (R, T) in (3) defining the between the coordinate frames of the two cameras. Each camera coordinate frame. 
Collaborative Reconstruction with 
Self-Localization (CVPR Workshop on Vision in Action: Efficient strategies for 
Collaborative Exploration - Vision in Action Since we cannot rely on any extrinsic calibration, we perform the calibration 
Collaborative Exploration - Vision in Action Since we cannot rely on any extrinsic calibration, we perform the calibration 
V2 = R ∗ (V1 + T) inria-Machine Vision and Perception MVP Group @ TUM 
of the extrinsic parameters directly from the current observation. We need find the transformation parameters (R, T) in (3) defining the transformation 
between the coordinate frames of the two cameras. Each camera defines its coordinate frame. 
of the extrinsic parameters directly from the current observation. We need find the transformation cognitive agents parameters in complex environments) 
(R, T) in (3) defining the transformation 
between the coordinate frames of the two cameras. Each camera defines its own 
coordinate frame. 
4 DariusBurschka 
2.1 3D Reconstruction from Motion Stereo 
In our system, the cameras undergo an arbitrary motion (R, T) which results 
in two independent observations (n1,n2) of a point P. The equation (3) can written using (2) as 
2.1 3D Reconstruction from Motion Stereo 
In our system, the cameras undergo an arbitrary motion (R, T) which results 
in two independent observations (n1,n2) of a point P. The equation (3) can written using (2) as 
λ2n2 = R ∗ (λ1n1 + T). We need to find the radial distances (λ1, λ2) along the incoming rays to estimate 
the 3D coordinates of the point. We can find it by re-writing (4) to 
λ2n2 = R ∗ (λ1n1 + T). We need to find the radial distances (λ1, λ2) !along λ1 
(−Rn1, n2)" the = R incoming · T 
rays to estimate 
the 3D coordinates of the point. We can find it λ2 by re-writing (4) to 
λ2 " = (−Rn1, n2)−∗ · R · T = D−∗ · R · T 
λ2 " = R · T 
!λ1 
(−Rn1, n2)!λ1 
We use in (5) the pseudo inverse matrix D−∗ to solve for the two unknown distances (λ1, λ2). A pseudo-inverse matrix to D can be calculated according 
version 1 - 30 Sep 2008 
Fig. 2. Collaborative 3D reconstruction from 2 independently moving cameras. 
directional system with a large field of view. 
!λ1 
λ2 " = (−Rn1, n2)−∗ · R · T = D−∗ · R · T 
to 
D−∗ = (DT · D)−1 ·DT. The pseudo-inverse operation finds a least square approximation satisfying overdetermined set of three equations with two unknowns (λ1, λ2) in (5). to calibration and detection errors, the two lines V1 and V2 in Fig. 2 do necessarily intersect. Equation (5) calculates the position of the point along 
each line closest to the other line. 
We decided to use omnidirectional systems in-stead 
We use in (5) the pseudo inverse matrix D−∗ to solve for the two unknown distances (λ1, λ2). A pseudo-inverse matrix to D can be calculated according 
of fish-lens cameras, because their single view-point 
property [2] is essential for our combined 
localization and reconstruction approach (Fig. 3). 
This property allows an easy recovery of the viewing 
angle of the virtual camera with the focal point F 
(Fig. 3) directly from the image coordinates (ui, νi). 
A standard perspective camera can be mapped on 
our generic model of an omnidirectional sensor. The 
only limitation of a standard perspective camera is 
version 1 - 30 Sep 2008 
causes occlusions between the agents although the target is still in view of both 
cameras. 
Our approach offers a robust initialization method for the system presented 
in [3]. The original approach relied on an essential method to initialize the 
3D structure in the world. Our system gives a more robust initialization method 
minimizing the image error directly. The limited space of this paper does not 
allow a detailed description of this part of the system. The recursive approach 
from [3] is used to maintain the radial distance λx. 
3 Results 
Our flying systems use omnidirectional mirrors like the one depicted in Fig. 6 
Fig. 6. Flying agent equipped with an omnidirectional sensor pointing upwards. 
We tested the system on several indoor and outdoor sequences with two cam-eras 
observing the world through different sized planar mirrors (Fig. 4) using a 
Linux laptop computer with a 1.2 GHz Pentium Centrino processor. The system 
was equipped with 1GB RAM and was operating two Firewire cameras with 
standard PAL resolution of 768x576. 
3.1 Accuracy of the Estimation of Extrinsic Parameters 
We used the system to estimate the extrinsic motion parameters and achieved 
results comparable with the extrinsic camera calibration results. We verified 
the parameters by applying them to the 3D reconstruction process in (5) and 
achieved measurement accuracy below the resolution of our test system. This 
reconstruction was in the close range of the system which explains the high 
inria-00325805, version 1 - 30 Sep 2008 
of the camera f=1) (ui, νi) to 
ni = 
(ui, νi, 1)T 
||(ui, νi, 1)T || 
. We rely on the fact that each camera can see the partner and the wants to reconstruct at the same time. 
In our system, Camera 1 observes the position of the focal point 2 along the vector T, and the point P to be reconstructed along simultaneously (Fig. 2). The second camera (Camera 2) uses its own frame to reconstruct the same point P along the vector V2. The point by this camera has modified coordinates [10]: 
2.1 3D Reconstruction from Motion Stereo 
In our system, the cameras undergo an arbitrary motion (R, in two independent observations (n1,n2) of a point P. The equation written using (2) as 
λ2n2 = R ∗ (λ1n1 + T). We need to find the radial distances (λ1, λ2) along the incoming the 3D coordinates of the point. We can find it by re-writing (λ1 
(−Rn1, n2)!" = R · T 
λ2 !λ1 
λ2 " = (−Rn1, n2)−∗ · R · T = D−∗ · R · T 
to 
- 30 Sep 2008 
We use in (5) the pseudo inverse matrix D−∗ to solve for the 2008 
D−∗ = (DT · D)−1 ·DT. The pseudo-inverse operation finds a least square approximation satisfying 1 We notice the similarity between the equations (1) and (5). Equation 00325805,
Machine Vision and Perception MVP Group @ TUM 
Source: DLR Perception Group
Machine Vision and Perception MVP Group @ TUM 
3 
5 VISAPP Asynchronous stereo for dynamic 
scenes 
4 DariusBurschka 
NTP 
Fig. 2. Collaborative 3D reconstruction from 2 independently moving cameras. 
directional system with a large field of view. 
We decided to use omnidirectional systems in-stead 
of fish-lens cameras, because their single view-point 
property [2] is essential for our combined 
localization and reconstruction approach (Fig. 3). 
This property allows an easy recovery of the viewing 
angle of the virtual camera with the focal point F 
(Fig. 3) directly from the image coordinates (ui, νi). 
Figure 2: Here: C0 and C1 are the camera centers of the 
stereo pair, P0,P1,P2 are the 3D poses of the point at times 
t0,t1,t2. Latter correspond to frame acquisition timestamps 
of camera C0. P⇤ is the 3D pose of the point at time t⇤, 
which correspond to the frame acquisition timestamp of the 
camera C1. Vectors v0,v1,v2, are unit vectors pointing from 
Since the angle vector v3 and −v0 ˆn is p2 
. We can compute equation: 
2 
4 −v0x −v0y v2x v2y nx ny 3.2 Path Reconstruction 
In the second stage the 3D pose P0. For represent the poses (2 as: 
Pi = 
2 
4 
ai ⇤zi 
bi ⇤zi 
zi 
2014
Machine Vision and Perception MVP Group @ TUM 
How to reconstruct 3D under poor texture 
conditions? 
Problem: texture information is more 
sparse 
43
Machine Vision and Perception MVP Group @ TUM 
What can we do if the texture information 
is almost non-existent? 
44 
→ photogrammetric approach
Machine Vision and Perception MVP Group @ TUM 
Reconstruction Example 
Works well under static lighting conditions and roughly 
Lambertian surfaces 
Ruepp and Burschka. Fast recovery of weakly textured surfaces from monocular image 
sequences. (ACCV2010)
Machine Vision and Perception MVP Group @ TUM 
Point Spread Function (PSF)
Point Light Sources 
Machine Vision and Perception MVP Group @ TUM 
For point light sources 
f (i, j) =δ (0,0)⇒ g(i, j) = h(i, j) 
thresholding
Machine Vision and Perception MVP Group @ TUM 
48 
Motion Blur to Support Tracking
Machine Vision and Perception MVP Group @ TUM 
Cepstrum 
The Cepstrum is the Fourier transformation of the log 
spectrum of an image it is therefore a tool for 
analyzing the frequency domain of an image
Examples 
Machine Vision and Perception MVP Group @ TUM 
H(u,v) is a periodic function with period T =1/d, therefore, every d 
there exist a zero crossing. The convolution operation in the frequency 
domain is transformed into the multiplication of the two matrices as a 
result the Power Spectrum of the blur PSF appears as a ripple in the 
Power Spectrum of the blurred image. This ripple can be identified by 
a negative peak in the Cepstrum domain. 
50
Machine Vision and Perception MVP Group @ TUM 
Automotive 
ž 18 Kameras für 360° 
Stereoabdeckung 
ž Ultraschall 
ž IMU + Dual dGPS 
ž Car2X Modul 
ž Telepräsenz Wifi Modul
Machine Vision and Perception MVP Group @ TUM 
RoMo’s Camera System 
ž Optimized camera placement with 
Dymola 
ž DLR Visualization Library
Machine Vision and Perception MVP Group @ TUM 
3D Bird-View
Machine Vision and Perception MVP Group @ TUM 
Navigation alternatives 
- strategy vs. instincts
Machine Vision and Perception MVP Group @ TUM 
How to parse complex 
situations in a robust way?
Machine Vision and Perception MVP Group @ TUM 
Matching modalities in the images 
• Direct image content 
• Texture/Pattern 
• Color 
• Pre-processed image features 
Only image data 
• Lines 
• Corners 
• Keypoints + Descriptors (SIFT,SURF, FAST, AGAST) 
• Derived features 
With external data 
• Depth information 
• Homographies 
• Structural relations between images (e.g. plane tracking)
Machine Vision and Perception MVP Group @ TUM 
Collision estimation for static and 
dynamic objects
Machine Vision and Perception MVP Group @ TUM 
Monocular Clustering of Objects
Machine Vision and Perception MVP Group @ TUM 
TTC from optical flow 
Schaub Burschka, IV2013
Current state of the art in manipulation... ? 
Machine Vision and Perception MVP Group @ TUM 
Manipulation clip from The Big Bang Theory
How can we automate manipulation? 
Machine Vision and Perception MVP Group @ TUM 
labeling motion parameters
Machine Vision and Perception MVP Group @ TUM 
What is in the scene? (labeling step)
Machine Vision and Perception MVP Group @ TUM 
IJRR 2012 Special Issue, Papazov et al.
Machine Vision and Perception MVP Group @ TUM 
What happens if an object is not in 
the database? 
Indexing to the Atlas database needs 
to be extended to object classes 
-> deformable shape registration 
needed 
Atlas information Observed object
Machine Vision and Perception MVP Group @ TUM 
Deformable Registration from 
generic models (special issue SGP'11 Papazov et al.) 
Matching of a detailed shape to 
a primitive prior 
The manipulation “heat map” from the 
generic model gets propagated
Deformable Registration 
Machine Vision and Perception MVP Group @ TUM 
(special issue SGP 11, Papazov et al) 
Input data
Machine Vision and Perception MVP Group @ TUM 
Deformable 3D Shape Registration 
Based on Local Similarity Transforms 
MVP
Machine Vision and Perception MVP Group @ TUM 
Physical and Geometric Properties of an Object 
(Object Contaier) (ICRA 2012 Petsch et al.)
Machine Vision and Perception MVP Group @ TUM 
Functional Properties of an Object 
stored in Functionality Map
Machine Vision and Perception MVP Group @ TUM 
Where else do we need embedded 
perception? 
70 
" No external navigation aids (GNSS) 
" No reliable (high bandwidth, low latency) radio link 
" Full on-board navigation solution
Machine Vision and Perception MVP Group @ TUM 
Mixed indoor/outdoor exploration 
71 
" Autonomous indoor/outdoor 
flight of 60m 
" Mapping resolution: 0.1m 
" Leaving through a window 
" Returning through door
Machine Vision and Perception MVP Group @ TUM
Machine Vision and Perception MVP Group @ TUM 
Vision Based Haptic Multisensor for 
Manipulation of Soft, Fragile 
Objects
Machine Vision and Perception MVP Group @ TUM 
Surface Response for different 
hard types of Objects 
soft 
deformable
Machine Vision and Perception MVP Group @ TUM 
Conclusion 
75 
Tracking rinds correspondences between two or more images using 
different types of information and has varying sensitivity to errors 
• Direct image content 
• Texture/Pattern 
• Color 
• Pre-processed image features 
• Lines 
• Corners 
• Keypoints + Descriptors (SIFT,SURF, FAST, AGAST) 
• Derived features 
• Depth information 
• Homographies 
• Structural relations between images (e.g. plane tracking)
Machine Vision and Perception MVP 
Group @ TUM 
Research of the MVP Group 
Visual navigation 
The Machine Vision and 
Perception Group @TUM works 
on the aspects of visual 
perception and control in 
medical, mobile, and HCI 
applications 
Biologically motivated 
perception 
Perception for manipulation 
Visual Action Analysis 
Photogrammetric monocular 
reconstruction 
Rigid and Deformable 
Registration
Machine Vision and Perception MVP 
Group @ TUM 
Research of the MVP Group 
Sensor substitution 
Exploration of physical 
object properties 
Development of new 
Optical Sensors 
Multimodal Sensor 
Fusion 
The Machine Vision and 
Perception Group @TUM works 
on the aspects of visual 
perception and control in 
medical, mobile, and HCI 
applications
Machine Vision and Perception MVP Group @ TUM 
MVP 
Research at DLR

More Related Content

PDF
Lecture 02 yasutaka furukawa - 3 d reconstruction with priors
PDF
DimEye Corp Presents Revolutionary VLS (Video Laser Scan) at SS IMMR 2013
PDF
Stereo vision
PPT
Build Your Own 3D Scanner: Introduction
PDF
Visual Odometry using Stereo Vision
PPTX
Concept of stereo vision based virtual touch
PDF
iwvp11-vivet
PPTX
mihara_iccp16_presentation
Lecture 02 yasutaka furukawa - 3 d reconstruction with priors
DimEye Corp Presents Revolutionary VLS (Video Laser Scan) at SS IMMR 2013
Stereo vision
Build Your Own 3D Scanner: Introduction
Visual Odometry using Stereo Vision
Concept of stereo vision based virtual touch
iwvp11-vivet
mihara_iccp16_presentation

What's hot (20)

PDF
Viva3D Stereo Vision user manual en 2016-06
PPTX
Stereo vision
PPTX
Neural Scene Representation & Rendering: Introduction to Novel View Synthesis
PDF
Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015)
PPTX
Pengantar Structure from Motion Photogrammetry
PPT
Build Your Own 3D Scanner: Conclusion
PDF
2008 brokerage 03 scalable 3 d models [compatibility mode]
PPT
Build Your Own 3D Scanner: The Mathematics of 3D Triangulation
PPTX
3D Shape and Indirect Appearance by Structured Light Transport
PPTX
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 4)
PDF
Disparity map generation based on trapezoidal camera architecture for multi v...
PPTX
Primal-Dual Coding to Probe Light Transport
PPTX
Optical Computing for Fast Light Transport Analysis
PPTX
Multi-camera Time-of-Flight Systems | SIGGRAPH 2016
PPTX
Mapping virtual and physical reality
PDF
Introduction of slam
PPTX
Temporal Frequency Probing for 5D Transient Analysis of Global Light Transport
PPTX
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 3)
PDF
Robots that see like humans
PPTX
Ray tracing
Viva3D Stereo Vision user manual en 2016-06
Stereo vision
Neural Scene Representation & Rendering: Introduction to Novel View Synthesis
Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015)
Pengantar Structure from Motion Photogrammetry
Build Your Own 3D Scanner: Conclusion
2008 brokerage 03 scalable 3 d models [compatibility mode]
Build Your Own 3D Scanner: The Mathematics of 3D Triangulation
3D Shape and Indirect Appearance by Structured Light Transport
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 4)
Disparity map generation based on trapezoidal camera architecture for multi v...
Primal-Dual Coding to Probe Light Transport
Optical Computing for Fast Light Transport Analysis
Multi-camera Time-of-Flight Systems | SIGGRAPH 2016
Mapping virtual and physical reality
Introduction of slam
Temporal Frequency Probing for 5D Transient Analysis of Global Light Transport
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 3)
Robots that see like humans
Ray tracing
Ad

Viewers also liked (6)

DOC
Create table student
PPTX
PPTX
Object recognition
PDF
Object Detection and Recognition
PPTX
Object detection
PPTX
Object Recognition
Create table student
Object recognition
Object Detection and Recognition
Object detection
Object Recognition
Ad

Similar to Keynote at Tracking Workshop during ISMAR 2014 (20)

PDF
Vision based non-invasive tool for facial swelling assessment
PDF
Design and implementation of video tracking system based on camera field of view
PDF
D04432528
PDF
Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...
PDF
40120140503006
PDF
Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Enviro...
PDF
An Assessment of Image Matching Algorithms in Depth Estimation
PDF
Dense Visual Odometry Using Genetic Algorithm
PDF
DIGITAL RESTORATION OF TORN FILMS USING FILTERING T ECHNIQUES
PDF
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
PDF
I0343065072
PDF
IMPROVING IMAGE RESOLUTION THROUGH THE CRA ALGORITHM INVOLVED RECYCLING PROCE...
PDF
Improving image resolution through the cra algorithm involved recycling proce...
PDF
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
PDF
Intelligent indoor mobile robot navigation using stereo vision
PDF
05397385
PDF
05397385
PDF
Generating 3 d model from video
PPT
Introduction to Machine Vision
Vision based non-invasive tool for facial swelling assessment
Design and implementation of video tracking system based on camera field of view
D04432528
Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...
40120140503006
Visual Mapping and Collision Avoidance Dynamic Environments in Dynamic Enviro...
An Assessment of Image Matching Algorithms in Depth Estimation
Dense Visual Odometry Using Genetic Algorithm
DIGITAL RESTORATION OF TORN FILMS USING FILTERING T ECHNIQUES
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
I0343065072
IMPROVING IMAGE RESOLUTION THROUGH THE CRA ALGORITHM INVOLVED RECYCLING PROCE...
Improving image resolution through the cra algorithm involved recycling proce...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Intelligent indoor mobile robot navigation using stereo vision
05397385
05397385
Generating 3 d model from video
Introduction to Machine Vision

More from Darius Burschka (7)

PDF
Robot Control using 20 lines of C++ of Code and Classical Vision Tools
PDF
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
PPTX
Deep Learning - a Path from Big Data Indexing to Robotic Applications
PDF
Understanding of Process-Workflow through Identification of Action Constraint...
PPTX
Task-Representation for Robust Coupling of Perception to Action in Dynamic En...
PDF
Challenges for Perception in Navigation and Manipulation
PDF
Sensory Substitution
Robot Control using 20 lines of C++ of Code and Classical Vision Tools
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
Deep Learning - a Path from Big Data Indexing to Robotic Applications
Understanding of Process-Workflow through Identification of Action Constraint...
Task-Representation for Robust Coupling of Perception to Action in Dynamic En...
Challenges for Perception in Navigation and Manipulation
Sensory Substitution

Recently uploaded (20)

PPTX
Ease_of_Paying_Taxes_Act_Presentation.pptx
PPTX
Introduction to DATIS a foundation stone for ISSP in Greece
PDF
Echoes of AccountabilityComputational Analysis of Post-Junta Parliamentary Qu...
PDF
Ch-5.pdf important formulas requires for class 12
PPTX
Animal Farm powerpointpresentation- Kopie – Kopie.pptx
PPTX
Training for Village Watershed Volunteers.pptx
PDF
Building event-driven application with RAP Business Events in ABAP Cloud
PPTX
The-Precambrian-Geology-of-India-The-Dharwar-Craton.pptx
PPTX
WEB_DEVELOPMENTGJMFGHJMGJMFJM FGJMFGHMNF
PPTX
Lesson 2 (Technology and Transmission) - Terms.pptx
PPTX
Rakhi Presentation vbbrfferregergrgerg.pptx
PPT
Lessons from Presentation Zen_ how to craft your story visually
PPTX
Ulangan Harian_TEOREMA PYTHAGORAS_8.pptx
PPTX
Knowledge Knockout ( General Knowledge Quiz )
PDF
The History of COBSI, a Community-based Smallholder Irrigation, and its Regio...
PDF
Criminology Midterm-Ed Gein Presentation
PDF
Financial Managememt CA1 for Makaut Student
PDF
Books and book chapters(CITATIONS AND REFERENCING) (LORENA).pdf
PPTX
Literatura en Star Wars (Legends y Canon)
Ease_of_Paying_Taxes_Act_Presentation.pptx
Introduction to DATIS a foundation stone for ISSP in Greece
Echoes of AccountabilityComputational Analysis of Post-Junta Parliamentary Qu...
Ch-5.pdf important formulas requires for class 12
Animal Farm powerpointpresentation- Kopie – Kopie.pptx
Training for Village Watershed Volunteers.pptx
Building event-driven application with RAP Business Events in ABAP Cloud
The-Precambrian-Geology-of-India-The-Dharwar-Craton.pptx
WEB_DEVELOPMENTGJMFGHJMGJMFJM FGJMFGHMNF
Lesson 2 (Technology and Transmission) - Terms.pptx
Rakhi Presentation vbbrfferregergrgerg.pptx
Lessons from Presentation Zen_ how to craft your story visually
Ulangan Harian_TEOREMA PYTHAGORAS_8.pptx
Knowledge Knockout ( General Knowledge Quiz )
The History of COBSI, a Community-based Smallholder Irrigation, and its Regio...
Criminology Midterm-Ed Gein Presentation
Financial Managememt CA1 for Makaut Student
Books and book chapters(CITATIONS AND REFERENCING) (LORENA).pdf
Literatura en Star Wars (Legends y Canon)

Keynote at Tracking Workshop during ISMAR 2014

  • 1. Darius Burschka Machine Vision and Perception Group (MVP) Department of Computer Science Technische Universität München
  • 2. Machine Vision and Perception MVP Group @ TUM What is the information in a single image? • Reduction of dimensionality • Horizontal and vertical dimensions are scaled Distant objects are smaller with the distance to the object 11/02/06 Vision, Darius Burschka
  • 3. Computer Vision vs. Human Vision Machine Vision and Perception MVP Group @ TUM What is the correct Image information?
  • 4. Machine Vision and Perception MVP Group @ TUM Illusions: What do they tell us about perception of humans? Geometry is perceived with strong prior assumptions
  • 5. Machine Vision and Perception MVP Group @ TUM Illusions: what do they tell us about our brightness perception? Brightness is perceived not as direct measurement but under strong assumption about position and brightness of the light source
  • 6. Machine Vision and Perception MVP Group @ TUM 1. INTRODUCTION What does a single camera image tell us? reference image between a template Figure image representing 1.1: (a) A good the object match of between interest a (left) template image it turns out, that and the a beer scene bottle (right). is a sticker (b) However, put on it the turns surface out, that the
  • 7. Machine Vision and Perception MVP Group @ TUM Camera as measurement device?
  • 8. Machine Vision and Perception MVP Group @ TUM 1.1 background 3 Iconic images Segmented images Geometric representations Relational models Multiple People Tracking Motion Extraction Object Registration Object Recognition Semantic Map Object Modeling Activation Detection Scene Labeling Object Segmentation Intensity Color Depth Figure 1.2: Different level visual data structures as iconic image, segmented images, geomet-ric representations and relational models. Some vision application examples are Visual Data Matching Image courtesy of Wei Wang
  • 9. Machine Vision and Perception MVP Group @ TUM Matching modalities in the images • Direct image content • Texture/Pattern • Color • Pre-processed image features Only image data • Lines • Corners • Keypoints + Descriptors (SIFT,SURF, FAST, AGAST) • Derived features With external data • Depth information • Homographies • Structural relations between images (e.g. plane tracking)
  • 10. Machine Vision and Perception MVP Group @ TUM What can be tracked/matched? Color segmentation Pattern tracking Depth processing Image processing Dynamic Vision (XVision) algorithms
  • 11. Machine Vision and Perception MVP Group @ TUM What can be tracked/matched? Dynamic Vision (XVision) algorithms applications
  • 12. Machine Vision and Perception MVP Group @ TUM Color based Blob Tracking
  • 13. Machine Vision and Perception MVP Group @ TUM ICRA 2001
  • 14. Machine Vision and Perception MVP Group @ TUM Application Manipulation in 2D
  • 15. Machine Vision and Perception MVP Group @ TUM Efficient Pattern Tracker (SSD) XVision supports fast and effective robust filtering to handle unexpected changes in i l l u m i n a t i o n and in the composition of the tracked target Current image Image Warping - Δp p Reference Weighting Model Inverse Σ
  • 16. Machine Vision and Perception MVP Group @ TUM
  • 17. B z the corresponding location u, v) on = the , plane in (2) the other image. In indoor environments many surfaces can approximated with planes E. Machine Vision and Perception MVP Group @ TUM Direct Navigation in 3D data (plane tracking ICRA 2003) with B describing the distance between the cameras of the stereo system [16]. We estimate the disparity D(u, v) of the plane E at an image point (u, v) using the unit focal length camera (f=1) projection to Ix refers image. We neglect series. The system Decomposition error term: e(u, ⎡ E : ax + by + cz = d In a stereo !system z̸= 0 : a with non-verged, unit focal x z + b y z + c = d z length (f=1) cameras the au + image bv + c = planes k · D(u, v) are (3) coplanar. x z y z d B this case, the disparity with u = , value v = , D(k= u, v) of a point (u,the image can be estimated from its depth z to The vector n = (a b c)T is normal to the plane E and describes the orientation of the plane relative to the camera. The equation (3) can be written in the form D(u, v) = B z , ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ρ1 u u with B describing D(u, v) the = ⎝ distance ρ2 ⎠ · ⎝ v between ⎠ = n∗ · ⎝ v the ⎠ (4) cameras magnitude linearizing about image. series. Decomposition error ⎡ ⎢⎢⎣ D(u, v) = B z , (2) with B describing the distance between the cameras of the stereo system [16]. We estimate the disparity D(u, v) of the plane E at an image point (u, v) using the unit focal length camera (f=1) projection to !z̸= 0 : a x z + b y z + c = d z au + bv + c = k · D(u, v) (3) with u = x z , v = y z , k= d B The vector n = (a b c)T is normal to the plane E and describes the orientation of the plane relative to the camera. The equation (3) can be written in the form D(u, v) = ⎛ ⎝ ρ1 ρ2 ρ3 ⎞ ⎠ · ⎛ ⎝ u v 1 ⎞ ⎠ = n∗ · ⎛ ⎝ u v 1 ⎞ ⎠ (4) with ρ1 = a k , ρ2 = b k , ρ3 = c k This form uses modified parameters ρ1, ρ2, ρ3 of the plane E relating the image data u, v to D(u, v). magnitude for linearizing the expression about p. E(δp) ≈ & Here, ⎢⎢⎣ e(u1, v1) e(u1, v2) . . . . . . . . . e(um, vn) ⎤ ⎥⎥⎦ 2.2.2 Mask Management Thus far, we have of a the approach to a mask into the weighting matrix pixel’s inclusion
  • 18. Machine Vision and Perception MVP Group @ TUM Local Feature Tracking Algorithms • Image-gradient based à Extended KLT (ExtKLT) • patch-based implementation • feature propagation • corner-binding + sub-pixel accuracy • algorithm scales bad with number of features • Tracking-By-Matching à AGAST tracker • AGAST corner detector • efficient descriptor • high frame-rates (hundrets of features in a few milliseconds) + algorithm scales well with number of features • pixel-accuracy
  • 19. Machine Vision and Perception MVP Group @ TUM Adaptive and Generic Accelerated Segment Test (AGAST) Improvements necessary for embedded processors: • full exploration of the configuration space by backward-induction (no learning) • binary decision tree (not ternary) • computation of the actual probability and processing costs (no greedy algorithm) • automatic scene adaption by tree switching (at no cost) • various corner pattern sizes (not just one) No drawbacks! Mair, Hager, Burschka, Suppa, Hirzinger ECCV, Springer, 2010 E. Rosten
  • 20. Machine Vision and Perception MVP Group @ TUM Vision-Based Navigation with Monocular Camera Vision-Based Control ICRA 2001 & 2003 V-GPS: navigation part IROS 2003 How can we construct the model on-the-fly?
  • 21. Machine Vision and Perception MVP Group @ TUM Why not Image Jacobian?
  • 22. Machine Vision and Perception MVP Group @ TUM What are we trying to solve? How to estimate the relative translation T and the rotation R between two camera positions ⇒ Camera Ego-Motion Problem: monocular camera projection reduces the space by one dimension, therefore, external reference (a model) is necessary for 6DoF pose estimation → SLAM
  • 23. Machine Vision and Perception MVP Group @ TUM VSLAM system IROS 2003
  • 24. Machine Vision and Perception MVP Group @ TUM Spherical Image Representation o Allows easy mapping on a variety of physical sensors o Avoids the angular limitations of a planar image o Better describes the physical imaging process
  • 25. Machine Vision and Perception MVP Group @ TUM Work on Optimal Sensor Models
  • 26. Machine Vision and Perception MVP Group @ TUM Omnidirectional Vision
  • 27. Machine Vision and Perception MVP Group @ TUM Recursive Ego-Motion Estimation
  • 28. Machine Vision and Perception MVP Group @ TUM Real Time Pose Tracking
  • 29. Machine Vision and Perception MVP Group @ TUM How can we acquire the geometry?
  • 30. Machine Vision and Perception MVP Group @ TUM Strobl, Mair, Bodenmüller, Kielhofer, Sepp, Suppa, Burschka, Hirzinger Feature Propagation ž Two motion prediction concepts — 2D feature propagation by motion derivatives — IMU-based feature prediction ž Combination of both: — translation propagation by feature velocity (2D) — rotation propagation by gyroscopes no feature propagation IROS, IEEE/RSJ, 2009, Best Paper Finalist Mair, Strobl, Bodenmüller, Suppa, Burschka KI, Springer Journal, 2010
  • 31. Machine Vision and Perception MVP Group @ TUM Strobl, Mair, Bodenmüller, Kielhofer, Sepp, Suppa, Burschka, Hirzinger Feature Propagation ž Two motion prediction concepts — 2D feature propagation by motion derivatives — IMU-based feature prediction ž Combination of both: — translation propagation by feature velocity (2D) — rotation propagation by gyroscopes IROS, IEEE/RSJ, 2009, Best Paper Finalist Mair, Strobl, Bodenmüller, Suppa, Burschka linear feature propagation KI, Springer Journal, 2010
  • 32. Machine Vision and Perception MVP Group @ TUM Feature Propagation ž Two motion prediction concepts — 2D feature propagation by motion derivatives — IMU-based feature prediction ž Combination of both: — translation propagation by feature velocity (2D) — rotation propagation by gyroscopes Strobl, Mair, Bodenmüller, Kielhofer, Sepp, Suppa, Burschka, Hirzinger IROS, IEEE/RSJ, 2009, Best Paper Finalist Mair, Strobl, Bodenmüller, Suppa, Burschka KI, Springer Journal, 2010 linear + gyros based prop.
  • 33. Z∞ – Algorithm at Work Mair, Burschka Machine Vision and Perception MVP Group @ TUM Mobile Robots Navigation, book chapter, In-Tech, 2010 Simple sensors, low processing power Obstacle avoidance
  • 34. Machine Vision and Perception MVP Group @ TUM
  • 35. Machine Vision and Perception MVP Group @ TUM
  • 36. Machine Vision and Perception MVP Group @ TUM Low cost Car Navigation with Embedded Systems Burschka,Mair RobotVision 2008
  • 37. Machine Vision and Perception MVP Group @ TUM „Simple“ Image Acquisition 60 images taken with a standard low cost digital camera
  • 38. Machine Vision and Perception MVP Group @ TUM Estimation of the 6 Degrees of Freedom Estimation of 3 rotational angles Estimation of a translation vector
  • 39. Machine Vision and Perception MVP Group @ TUM 3D Reconstruction from the Images using Navigation Data (courtesy: H.Hirschmüller, DLR)
  • 40. 00325805, Collaborative Exploration - Vision Since we cannot rely on any extrinsic calibration, we perform of the extrinsic parameters directly from the current observation. find the transformation parameters (R, T) in (3) defining the between the coordinate frames of the two cameras. Each camera coordinate frame. Collaborative Reconstruction with Self-Localization (CVPR Workshop on Vision in Action: Efficient strategies for Collaborative Exploration - Vision in Action Since we cannot rely on any extrinsic calibration, we perform the calibration Collaborative Exploration - Vision in Action Since we cannot rely on any extrinsic calibration, we perform the calibration V2 = R ∗ (V1 + T) inria-Machine Vision and Perception MVP Group @ TUM of the extrinsic parameters directly from the current observation. We need find the transformation parameters (R, T) in (3) defining the transformation between the coordinate frames of the two cameras. Each camera defines its coordinate frame. of the extrinsic parameters directly from the current observation. We need find the transformation cognitive agents parameters in complex environments) (R, T) in (3) defining the transformation between the coordinate frames of the two cameras. Each camera defines its own coordinate frame. 4 DariusBurschka 2.1 3D Reconstruction from Motion Stereo In our system, the cameras undergo an arbitrary motion (R, T) which results in two independent observations (n1,n2) of a point P. The equation (3) can written using (2) as 2.1 3D Reconstruction from Motion Stereo In our system, the cameras undergo an arbitrary motion (R, T) which results in two independent observations (n1,n2) of a point P. The equation (3) can written using (2) as λ2n2 = R ∗ (λ1n1 + T). We need to find the radial distances (λ1, λ2) along the incoming rays to estimate the 3D coordinates of the point. We can find it by re-writing (4) to λ2n2 = R ∗ (λ1n1 + T). We need to find the radial distances (λ1, λ2) !along λ1 (−Rn1, n2)" the = R incoming · T rays to estimate the 3D coordinates of the point. We can find it λ2 by re-writing (4) to λ2 " = (−Rn1, n2)−∗ · R · T = D−∗ · R · T λ2 " = R · T !λ1 (−Rn1, n2)!λ1 We use in (5) the pseudo inverse matrix D−∗ to solve for the two unknown distances (λ1, λ2). A pseudo-inverse matrix to D can be calculated according version 1 - 30 Sep 2008 Fig. 2. Collaborative 3D reconstruction from 2 independently moving cameras. directional system with a large field of view. !λ1 λ2 " = (−Rn1, n2)−∗ · R · T = D−∗ · R · T to D−∗ = (DT · D)−1 ·DT. The pseudo-inverse operation finds a least square approximation satisfying overdetermined set of three equations with two unknowns (λ1, λ2) in (5). to calibration and detection errors, the two lines V1 and V2 in Fig. 2 do necessarily intersect. Equation (5) calculates the position of the point along each line closest to the other line. We decided to use omnidirectional systems in-stead We use in (5) the pseudo inverse matrix D−∗ to solve for the two unknown distances (λ1, λ2). A pseudo-inverse matrix to D can be calculated according of fish-lens cameras, because their single view-point property [2] is essential for our combined localization and reconstruction approach (Fig. 3). This property allows an easy recovery of the viewing angle of the virtual camera with the focal point F (Fig. 3) directly from the image coordinates (ui, νi). A standard perspective camera can be mapped on our generic model of an omnidirectional sensor. The only limitation of a standard perspective camera is version 1 - 30 Sep 2008 causes occlusions between the agents although the target is still in view of both cameras. Our approach offers a robust initialization method for the system presented in [3]. The original approach relied on an essential method to initialize the 3D structure in the world. Our system gives a more robust initialization method minimizing the image error directly. The limited space of this paper does not allow a detailed description of this part of the system. The recursive approach from [3] is used to maintain the radial distance λx. 3 Results Our flying systems use omnidirectional mirrors like the one depicted in Fig. 6 Fig. 6. Flying agent equipped with an omnidirectional sensor pointing upwards. We tested the system on several indoor and outdoor sequences with two cam-eras observing the world through different sized planar mirrors (Fig. 4) using a Linux laptop computer with a 1.2 GHz Pentium Centrino processor. The system was equipped with 1GB RAM and was operating two Firewire cameras with standard PAL resolution of 768x576. 3.1 Accuracy of the Estimation of Extrinsic Parameters We used the system to estimate the extrinsic motion parameters and achieved results comparable with the extrinsic camera calibration results. We verified the parameters by applying them to the 3D reconstruction process in (5) and achieved measurement accuracy below the resolution of our test system. This reconstruction was in the close range of the system which explains the high inria-00325805, version 1 - 30 Sep 2008 of the camera f=1) (ui, νi) to ni = (ui, νi, 1)T ||(ui, νi, 1)T || . We rely on the fact that each camera can see the partner and the wants to reconstruct at the same time. In our system, Camera 1 observes the position of the focal point 2 along the vector T, and the point P to be reconstructed along simultaneously (Fig. 2). The second camera (Camera 2) uses its own frame to reconstruct the same point P along the vector V2. The point by this camera has modified coordinates [10]: 2.1 3D Reconstruction from Motion Stereo In our system, the cameras undergo an arbitrary motion (R, in two independent observations (n1,n2) of a point P. The equation written using (2) as λ2n2 = R ∗ (λ1n1 + T). We need to find the radial distances (λ1, λ2) along the incoming the 3D coordinates of the point. We can find it by re-writing (λ1 (−Rn1, n2)!" = R · T λ2 !λ1 λ2 " = (−Rn1, n2)−∗ · R · T = D−∗ · R · T to - 30 Sep 2008 We use in (5) the pseudo inverse matrix D−∗ to solve for the 2008 D−∗ = (DT · D)−1 ·DT. The pseudo-inverse operation finds a least square approximation satisfying 1 We notice the similarity between the equations (1) and (5). Equation 00325805,
  • 41. Machine Vision and Perception MVP Group @ TUM Source: DLR Perception Group
  • 42. Machine Vision and Perception MVP Group @ TUM 3 5 VISAPP Asynchronous stereo for dynamic scenes 4 DariusBurschka NTP Fig. 2. Collaborative 3D reconstruction from 2 independently moving cameras. directional system with a large field of view. We decided to use omnidirectional systems in-stead of fish-lens cameras, because their single view-point property [2] is essential for our combined localization and reconstruction approach (Fig. 3). This property allows an easy recovery of the viewing angle of the virtual camera with the focal point F (Fig. 3) directly from the image coordinates (ui, νi). Figure 2: Here: C0 and C1 are the camera centers of the stereo pair, P0,P1,P2 are the 3D poses of the point at times t0,t1,t2. Latter correspond to frame acquisition timestamps of camera C0. P⇤ is the 3D pose of the point at time t⇤, which correspond to the frame acquisition timestamp of the camera C1. Vectors v0,v1,v2, are unit vectors pointing from Since the angle vector v3 and −v0 ˆn is p2 . We can compute equation: 2 4 −v0x −v0y v2x v2y nx ny 3.2 Path Reconstruction In the second stage the 3D pose P0. For represent the poses (2 as: Pi = 2 4 ai ⇤zi bi ⇤zi zi 2014
  • 43. Machine Vision and Perception MVP Group @ TUM How to reconstruct 3D under poor texture conditions? Problem: texture information is more sparse 43
  • 44. Machine Vision and Perception MVP Group @ TUM What can we do if the texture information is almost non-existent? 44 → photogrammetric approach
  • 45. Machine Vision and Perception MVP Group @ TUM Reconstruction Example Works well under static lighting conditions and roughly Lambertian surfaces Ruepp and Burschka. Fast recovery of weakly textured surfaces from monocular image sequences. (ACCV2010)
  • 46. Machine Vision and Perception MVP Group @ TUM Point Spread Function (PSF)
  • 47. Point Light Sources Machine Vision and Perception MVP Group @ TUM For point light sources f (i, j) =δ (0,0)⇒ g(i, j) = h(i, j) thresholding
  • 48. Machine Vision and Perception MVP Group @ TUM 48 Motion Blur to Support Tracking
  • 49. Machine Vision and Perception MVP Group @ TUM Cepstrum The Cepstrum is the Fourier transformation of the log spectrum of an image it is therefore a tool for analyzing the frequency domain of an image
  • 50. Examples Machine Vision and Perception MVP Group @ TUM H(u,v) is a periodic function with period T =1/d, therefore, every d there exist a zero crossing. The convolution operation in the frequency domain is transformed into the multiplication of the two matrices as a result the Power Spectrum of the blur PSF appears as a ripple in the Power Spectrum of the blurred image. This ripple can be identified by a negative peak in the Cepstrum domain. 50
  • 51. Machine Vision and Perception MVP Group @ TUM Automotive ž 18 Kameras für 360° Stereoabdeckung ž Ultraschall ž IMU + Dual dGPS ž Car2X Modul ž Telepräsenz Wifi Modul
  • 52. Machine Vision and Perception MVP Group @ TUM RoMo’s Camera System ž Optimized camera placement with Dymola ž DLR Visualization Library
  • 53. Machine Vision and Perception MVP Group @ TUM 3D Bird-View
  • 54. Machine Vision and Perception MVP Group @ TUM Navigation alternatives - strategy vs. instincts
  • 55. Machine Vision and Perception MVP Group @ TUM How to parse complex situations in a robust way?
  • 56. Machine Vision and Perception MVP Group @ TUM Matching modalities in the images • Direct image content • Texture/Pattern • Color • Pre-processed image features Only image data • Lines • Corners • Keypoints + Descriptors (SIFT,SURF, FAST, AGAST) • Derived features With external data • Depth information • Homographies • Structural relations between images (e.g. plane tracking)
  • 57. Machine Vision and Perception MVP Group @ TUM Collision estimation for static and dynamic objects
  • 58. Machine Vision and Perception MVP Group @ TUM Monocular Clustering of Objects
  • 59. Machine Vision and Perception MVP Group @ TUM TTC from optical flow Schaub Burschka, IV2013
  • 60. Current state of the art in manipulation... ? Machine Vision and Perception MVP Group @ TUM Manipulation clip from The Big Bang Theory
  • 61. How can we automate manipulation? Machine Vision and Perception MVP Group @ TUM labeling motion parameters
  • 62. Machine Vision and Perception MVP Group @ TUM What is in the scene? (labeling step)
  • 63. Machine Vision and Perception MVP Group @ TUM IJRR 2012 Special Issue, Papazov et al.
  • 64. Machine Vision and Perception MVP Group @ TUM What happens if an object is not in the database? Indexing to the Atlas database needs to be extended to object classes -> deformable shape registration needed Atlas information Observed object
  • 65. Machine Vision and Perception MVP Group @ TUM Deformable Registration from generic models (special issue SGP'11 Papazov et al.) Matching of a detailed shape to a primitive prior The manipulation “heat map” from the generic model gets propagated
  • 66. Deformable Registration Machine Vision and Perception MVP Group @ TUM (special issue SGP 11, Papazov et al) Input data
  • 67. Machine Vision and Perception MVP Group @ TUM Deformable 3D Shape Registration Based on Local Similarity Transforms MVP
  • 68. Machine Vision and Perception MVP Group @ TUM Physical and Geometric Properties of an Object (Object Contaier) (ICRA 2012 Petsch et al.)
  • 69. Machine Vision and Perception MVP Group @ TUM Functional Properties of an Object stored in Functionality Map
  • 70. Machine Vision and Perception MVP Group @ TUM Where else do we need embedded perception? 70 " No external navigation aids (GNSS) " No reliable (high bandwidth, low latency) radio link " Full on-board navigation solution
  • 71. Machine Vision and Perception MVP Group @ TUM Mixed indoor/outdoor exploration 71 " Autonomous indoor/outdoor flight of 60m " Mapping resolution: 0.1m " Leaving through a window " Returning through door
  • 72. Machine Vision and Perception MVP Group @ TUM
  • 73. Machine Vision and Perception MVP Group @ TUM Vision Based Haptic Multisensor for Manipulation of Soft, Fragile Objects
  • 74. Machine Vision and Perception MVP Group @ TUM Surface Response for different hard types of Objects soft deformable
  • 75. Machine Vision and Perception MVP Group @ TUM Conclusion 75 Tracking rinds correspondences between two or more images using different types of information and has varying sensitivity to errors • Direct image content • Texture/Pattern • Color • Pre-processed image features • Lines • Corners • Keypoints + Descriptors (SIFT,SURF, FAST, AGAST) • Derived features • Depth information • Homographies • Structural relations between images (e.g. plane tracking)
  • 76. Machine Vision and Perception MVP Group @ TUM Research of the MVP Group Visual navigation The Machine Vision and Perception Group @TUM works on the aspects of visual perception and control in medical, mobile, and HCI applications Biologically motivated perception Perception for manipulation Visual Action Analysis Photogrammetric monocular reconstruction Rigid and Deformable Registration
  • 77. Machine Vision and Perception MVP Group @ TUM Research of the MVP Group Sensor substitution Exploration of physical object properties Development of new Optical Sensors Multimodal Sensor Fusion The Machine Vision and Perception Group @TUM works on the aspects of visual perception and control in medical, mobile, and HCI applications
  • 78. Machine Vision and Perception MVP Group @ TUM MVP Research at DLR