SlideShare a Scribd company logo
Machine Learning of
 Structured Outputs

          Christoph Lampert
                  IST Austria
(Institute of Science and Technology Austria)
                Klosterneuburg



                Feb 2, 2011
Machine Learning of Structured Outputs



Overview...
   Introduction to Structured Learning
   Structured Support Vector Machines
   Applications in Computer Vision




Slides available at
http://www.ist.ac.at/~chl
What is Machine Learning?

Definition [T. Mitchell]:
Machine Learning is the study of computer algorithms
that improve their performance in a certain task
through experience.

    Example: Backgammon
        Task: play backgammon
        Experience: self-play
        Performance measure: games won against humans

    Example: Object Recognition
        Task: determine which objects are visible in images
        Experience: annotated training data
        Performance measure: object recognized correctly
What is structured data?

 Definition [ad hoc]:
 Data is structured if it consists of several parts, and
 not only the parts themselves contain information, but
 also the way in which the parts belong together.




            Text           Molecules / Chemical Structures




    Documents/HyperText               Images
The right tool for the problem.


 Example: Machine Learning for/of Structured Data




          image             body model            model fit

     Task: human pose estimation
     Experience: images with manually annotated body pose
     Performance measure: number of correctly localized body parts
Other tasks:

     Natural Language Processing:
         Automatic Translation (output: sentences)
         Sentence Parsing (output: parse trees)

     Bioinformatics:
         RNA Structure Prediction (output: bipartite graphs)
         Enzyme Function Prediction (output: path in a tree)

     Speech Processing:
         Automatic Transcription (output: sentences)
         Text-to-Speech (output: audio signal)

     Robotics:
         Planning (output: sequence of actions)

 This talk: only Computer Vision examples
"Normal" Machine Learning:
                     f : X → R.
  inputs X can be any kind of objects
     images, text, audio, sequence of amino acids, . . .
  output y is a real number
     classification, regression, . . .
  many way to construct f :
     f (x) = a · ϕ(x) + b,
     f (x) = decision tree,
     f (x) = neural network
Structured Output Learning:
                    f : X → Y.
  inputs X can be any kind of objects
  outputs y ∈ Y are complex (structured) objects
     images, parse trees, folds of a protein, . . .
  how to construct f ?
Predicting Structured Outputs: Image Denosing




    f:                             →
         input: images                       output: denoised images

   input set X = {grayscale images} = [0, 255]M ·N
                                    ˆ

   output set Y = {grayscale images} = [0, 255]M ·N
                                     ˆ

   energy minimization f (x) := argminy∈Y E(x, y)

   E(x, y) = λ   i (xi   − yi )2 + µ   i,j   |yi − yj |
Predicting Structured Outputs: Human Pose Estimation




                                     →
      input: image              body model                output: model fit


   input set X = {images}

   output set Y = {positions/angles of K body parts} = R4K .
                                                     ˆ

   energy minimization f (x) := argminy∈Y E(x, y)

   E(x, y) =   i   wi ϕfit (xi , yi ) +   i,j   wij ϕpose (yi , yj )
Predicting Structured Outputs: Shape Matching




  input: image pairs




  output: mapping y : xi ↔ y(xi )

        scoring function
        F (x, y) = i wi ϕsim (xi , y(xi )) +                            i,j   wij ϕdist (xi , xj , y(xi ), y(xj ))

        predict f : X → Y by f (x) := argmaxy∈Y F (x, y)

[J. McAuley et al.: "Robust Near-Isometric Matching via Structured Learning of Graphical Models", NIPS, 2008]
Predicting Structured Outputs: Tracking (by Detection)




  input:                                                output:
  image                                                 object position

        input set X = {images}

        output set Y = R2 (box center) or R4 (box coordinates)

        predict f : X → Y by f (x) := argmaxy∈Y F (x, y)

        scoring function F (x, y) = w ϕ(x, y) e.g. SVM score

images: [C. L., Jan Peters, "Active Structured Learning for High-Speed Object Detection", DAGM 2009]
Predicting Structured Outputs: Summary


Image Denoising
 y = argminy E(x, y) E(x, y) = w1        i
                                             (xi − yi )2 + w2          i,j
                                                                             |yi − yj |

Pose Estimation
 y = argminy E(x, y) E(x, y) =      i
                                        wi ϕ(xi , yi ) +    i,j
                                                                      wij ϕ(yi , yj )

Point Matching
 y = argmaxy F (x, y) F (x, y) =    i
                                        wi ϕ(xi , yi ) +        i,j
                                                                      wij ϕ(yi , yj )

Tracking
 y = argmaxy F (x, y) F (x, y) = w ϕ(x, y)
Unified Formulation
Predict structured output by maximization

             y = argmax F (x, y)
                    y∈Y


of a compatiblity function

          F (x, y) = w, ϕ(x, y)

that is linear in a parameter vector w.
Structured Prediction: how to evaluate argmaxy F (x, y)?




 chain                                               tree
    loop-free graphs: Shortest-Path / Belief Propagation (BP)




 grid                                            arbitrary graph
    loopy graphs: GraphCut, approximate inference (e.g. loopy BP)

   Structured Learning: how to learn F (x, y) from examples?
Machine Learning for Structured Outputs

 Learning Problem:
     Task: predict structured objects f : X → Y
     Experience: example pairs {(x 1 , y 1 ), . . . , (x N , y N )} ⊂ X × Y:
     typical inputs with "correct" outputs for them.


   {               ,             ,                               ,. . . }
       Performance measure: ∆ : Y × Y → R

 Our choice:
     parametric family: F (x, y; w) = w, ϕ(x, y)
       prediction method: f (x) = argmaxy∈Y F (x, y; w)
       Task: determine "good" w
Reminder: regularized risk minimization
 Find w for decision function F = w, ϕ(x, y) by
                                                  N
                                 2
    minw∈Rd              λ w            +              (y n , F (x n , ·; w))
                                              n=1
                 Regularization + empirical loss (on training data)


     Logistic Loss: Conditional Random Fields
          (y n , F (x n , ·; w)) = log            exp[F (x n , y; w) − F (x n , y n ; w)]
                                            y∈Y

     Hinge-loss: Maximum Margin Training
          (y n , F (x n , ·; w)) = max ∆(y n , y)+F (x n , y; w)−F (x n , y n ; w)
                                     y∈Y

     Exponential Loss: Boosting
          (y n , F (x n , ·; w)) =                exp[F (x n , y; w) − F (x n , y n ; w)]
                                     y∈Y{y n }
Maximum Margin Training
 of Structured Models
     (Structured SVMs)
Structured Support Vector Machine


Structured Support Vector Machine:
                   1         2
  minw∈Rd            w
                   2
                                 N
                     C
                   +                 max ∆(y n , y) + w, ϕ(x n , y) − w, ϕ(x n , y n )
                     N       n=1
                                     y∈Y


 Unconstrained optimization, convex, non-differentiable objective.



[I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun. "Large Margin Methods for Structured and Interdependent
Output Variables", JMLR, 2005.]
S-SVM Objective Function for w ∈ R2 :
                     S-SVM objective C =0.01                               S-SVM objective C =0.10
     3                                                     3


     2                                                     2


     1                                                     1


     0                                                     0


     1                                                     1


     2                                                     2
         3   2   1      0       1      2       3   4   5       3   2   1      0       1      2       3   4   5
                     S-SVM objective C =1.00                               S-SVM objective C→ ∞
     3                                                     3


     2                                                     2


     1                                                     1


     0                                                     0


     1                                                     1


     2                                                     2
         3   2   1      0       1      2       3   4   5       3   2   1      0       1      2       3   4   5
Structured Support Vector Machine:
           1     2
 minw∈Rd     w
           2
                     N
             C
           +             max ∆(y n , y) + w, ϕ(x n , y) − w, ϕ(x n , y n )
             N   n=1
                         y∈Y


Unconstrained optimization, convex, non-differentiable objective.
Structured SVM (equivalent formulation):
                                                 N
                            1        2       C
         minw∈Rd ,ξ∈Rn        w          +             ξn
                     +
                            2                N   n=1


subject to, for n = 1, . . . , N ,

        max ∆(y n , y) + w, ϕ(x n , y) − w, ϕ(x n , y n )   ≤ ξn
         y∈Y


n non-linear contraints, convex, differentiable objective.
Structured SVM (also equivalent formulation):
                                          N
                       1      2       C
    minw∈Rd ,ξ∈Rn        w        +             ξn
                +
                       2              N   n=1


subject to, for n = 1, . . . , N ,

   ∆(y n , y) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n ,   for all y ∈ Y

|Y|n linear constraints, convex, differentiable objective.
Example: A "True" Multiclass SVM
                                               
                                               1     for y = y
     Y = {1, 2, . . . , K }, ∆(y, y ) =                          .
                                               0     otherwise.
     ϕ(x, y) =      y = 1 Φ(x), y = 2 Φ(x), . . . , y = K Φ(x)
              = Φ(x)ey            with ey =y-th unit vector
Solve:
                                   N
                  1     2     C
         minw,ξ     w       +           ξn
                  2           N   n=1

subject to, for n = 1, . . . , N ,

         w, ϕ(x n , y n ) − w, ϕ(x n , y) ≥ 1 − ξ n         for all y ∈ Y.
Classification: MAP          f (x) = argmax          w, ϕ(x, y)
                                         y∈Y

                   Crammer-Singer Multiclass SVM
Hierarchical Multiclass Classification



Loss function can reflect hierarchy:
                                          cat   dog       car   bus

             1
  ∆(y, y ) := (distance in tree)
             2
  ∆(cat, cat) = 0, ∆(cat, dog) = 1,             ∆(cat, bus) = 2,      etc.

Solve:
                               N
             1     2       C
    minw,ξ     w       +             ξn
             2             N   n=1


subject to, for n = 1, . . . , N ,

    w, ϕ(x n , y n ) − w, ϕ(x n , y) ≥ ∆(y n , y) − ξ n   for all y ∈ Y.
Kernelized S-SVM problem:
Define
    joint kernel function k : (X × Y) × (X × Y) → R,
    kernel matrix Knn yy = k( (x n , y), (x n , y ) ).
                                                    1
            max              αny ∆(y n , y) −                 αny αn y   Knn yy
              n|Y|
           α∈R+      n=1,...,N                      2 y,y ∈Y
                       y∈Y                          n,n =1,...,N


    subject to, for n = 1, . . . , N ,

                                   C
                           αny ≤     .
                     y∈Y           N

    Kernelized prediction function:

                 f (x) = argmax               αny k( (x n , y n ), (x, y) )
                                 y∈Y     ny

Too many variables: train with working set of αny .
Applications
in Computer Vision
Example 1: Category-Level Object Localization




             What objects are present? person, car
Example 1: Category-Level Object Localization




                    Where are the objects?
Object Localization ⇒ Scene Interpretation




  A man inside of a car         A man outside of a car
  ⇒ He’s driving.               ⇒ He’s passing by.
Object Localization as Structured Learning:
    Given: training examples (x n , y n )n=1,...,N
    Wanted: prediction function f : X → Y where
             X = {all images}
             Y = {all boxes}




                                 

    fcar                            =
Structured SVM framework

Define:
    feature function ϕ : X × Y → Rd ,
    loss function ∆ : Y × Y → R,
    routine to solve argmaxy ∆(y n , y) + w, ϕ(x n , y n ) .

                                 1        2              N
Solve:              minw,ξ       2
                                     w        +C         n=1    ξN            subject to

          ∀y ∈ Y : ∆(y, y n ) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n ,

Result:
    w ∗ that determines scoring function F (x, y) = w ∗ , ϕ(x, y) ,
    localization function: f (x) = argmaxy F (x, y).

• M. Blaschko, C.L.: Learning to Localize Objects with Structured Output Regression, ECCV 2008.
Feature function: how to represents a (image,box)-pair (x, y)?

Obs: whether y is the right box for x, depends only on x|y .

                          ϕ(x, y) := h(x|y )

where h(r) is a (bag of visual word) histogram representation of the
region r.



ϕ                = h(           ) ≈ h(           )=ϕ


ϕ                = h(           ) ≈ h(           )=ϕ


     ϕ               = h(       ) ≈ h(         )=ϕ               ...
Structured SVM framework

Define:
    feature function ϕ : X × Y → Rd ,
    loss function ∆ : Y × Y → R,
    routine to solve argmaxy ∆(y n , y) + w, ϕ(x n , y n ) .

                                 1        2              N
Solve:              minw,ξ       2
                                     w        +C         n=1    ξN            subject to

          ∀y ∈ Y : ∆(y, y n ) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n ,

Result:
    w ∗ that determines scoring function F (x, y) = w ∗ , ϕ(x, y) ,
    localization function: f (x) = argmaxy F (x, y).

• M. Blaschko, C.L.: Learning to Localize Objects with Structured Output Regression, ECCV 2008.
Loss function: how to compare two boxes y and y ?




         ∆(y, y ) := 1 − area overlap between y and y
                         area(y ∩ y )
                   =1−
                         area(y ∪ y )
Structured SVM framework

Define:
    feature function ϕ : X × Y → Rd ,
    loss function ∆ : Y × Y → R,
    routine to solve argmaxy ∆(y n , y) + w, ϕ(x n , y n ) .

                                 1        2              N
Solve:              minw,ξ       2
                                     w        +C         n=1    ξN            subject to

          ∀y ∈ Y : ∆(y, y n ) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n ,

Result:
    w ∗ that determines scoring function F (x, y) = w ∗ , ϕ(x, y) ,
    localization function: f (x) = argmaxy F (x, y).

• M. Blaschko, C.L.: Learning to Localize Objects with Structured Output Regression, ECCV 2008.
How to solve                 f (x) = argmaxy ∆(y n , y) + w, ϕ(x n , y)                                 ?

Option 1) Sliding Window
                                                                      1 − 0.3 = 0.7
                                                                      1 − 0.8 = 0.2
                                                                      1 − 0.1 = 0.9
                                                                      1 − 0.2 = 0.8
                                                                      ...
                                                                      0.3 + 1.4 = 1.7
                                                                      0 + 1.5 = 1.5
                                                                      ...
                                                                      1 − 1.2 = −0.2
                                                                      1 − 0.3 = 0.7


Option 2) Branch-and-Bound Search (another talk)

• C.L., M. Blaschko, T. Hofmann: Beyond Sliding Windows: Object Localization by Efficient Subwindow Search, CVPR 2008.
Structured Support Vector Machine


                                                                  N
                                                 1       2
    S-SVM Optimization:                 minw,ξ   2
                                                     w       +C         ξn
                                                                  n=1
    subject to for n = 1, . . . , N :

     ∀y ∈ Y : ∆(y, y n ) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n ,

    Solve via constraint generation:
    Iterate:
         Solve minimization with working set of contraints: new w
         Identify argmaxy∈Y ∆(y, y n ) + w, ϕ(x n , y)
         Add violated constraints to working set and iterate
    Polynomial time convergence to any precision ε
Example: Training set (x1 , z1 ), . . . , (x4 , y4 )
Initialize: no constraints


     Solve minimization with working set of contraints ⇒ w=0
     Identify argmaxy∈Y ∆(y, y n ) + w, ϕ(x n , y)
            w, ϕ(x n , y) = 0 → pick any window with ∆(y, y n ) = 1




     Add violated constraints to working set and iterate

       w,      − w,            ≥ 1,      w,        − w,       ≥ 1,

         w,        − w,        ≥ 1,      w,        − w,      ≥ 1.
Working set of constraints:
        w,     − w,           ≥ 1,        w,    − w,      ≥ 1,

         w,      − w,         ≥ 1,        w,     − w,     ≥ 1.

    Solve minimization with working set of contraints
    Identify argmaxy∈Y ∆(y, y n ) + w, ϕ(x n , y)




    Add violated constraints to working set and iterate

         w,     − w,      ≥ 1,       w,        − w,        ≥ 0.9,

   w,         − w,      ≥ 0.8,       w,        − w,        ≥ 0.01.
Working set of constraints:

   w,      − w,           ≥ 1,      w,      − w,       ≥1

     w,       − w,        ≥ 1,      w,        − w,              ≥ 0.9,

      w,      − w,        ≥ 1,      w,        − w,        ≥ 0.8,

      w,       − w,       ≥ 1,      w,        − w,              ≥ 0.01.


    Solve minimization with working set of contraints
    Identify argmaxy∈Y ∆(y, y n ) + w, ϕ(x n , y)




    Add violated constraints to working set and iterate,. . .
N
                                             1       2
S-SVM Optimization:                 minw,ξ   2
                                                 w       +C         ξn
                                                              n=1
subject to for n = 1, . . . , N :

∀y ∈ Y : ∆(y, y n ) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n ,

Solve via constraint generation:
Iterate:
     Solve minimization with working set of contraints
     Identify argmaxy∈Y ∆(y, y n ) + w, ϕ(x n , y)
     Add violated constraints to working set and iterate

Similar to classical bootstrap training, but:
     force margin between correct and incorrect location scores,
     handle overlapping detections by fractional scores.
Results: PASCAL VOC 2006




            Example detections for VOC 2006 bicycle, bus and cat.




           Precision–recall curves for VOC 2006 bicycle, bus and cat.


    Structured training improves detection accuracy.
More Recent Results (PASCAL VOC 2009)




aeroplane
More Recent Results (PASCAL VOC 2009)




horse
More Recent Results (PASCAL VOC 2009)




sheep
More Recent Results (PASCAL VOC 2009)




sofa
Why does it work?




         Learned weights from binary (center) and structured training (right).



    Both training methods: positive weights at object region.
    Structured training: negative weights for features just outside
    the bounding box position.
    Posterior distribution over box coordinates becomes more
    peaked.
Example II: Category-Level Object Segmentation




                 Where exactly are the objects?
Segmentation as Structured Learning:
    Given: training examples (x n , y n )n=1,...,N


      {                  ,               ,           ,


                                  ,                      }
    Wanted: prediction function f : X → Y with
          X = {all images}
          Y = {all binary segmentations}
Structured SVM framework

Define:
    Feature functions: ϕ(x, y) → Rd
            unary terms ϕi (x, yi ) for each pixel i
            pairwise terms ϕij (x, yi , yj ) for neighbors (i, j)
    Loss function ∆ : Y × Y → R.
            ideally decomposes like ϕ

                        1       2        N
Solve:         minw,ξ   2
                            w       +C   n=1   ξn     subject to

         ∀y ∈ Y : ∆(y, y n ) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n ,

Result:
    w ∗ that determines scoring function F (x, y) = w ∗ , ϕ(x, y) ,
    segmentation function: f (x) = argmaxy F (x, y).
Example choices:

Feature functions: unary terms c = {i}:
                    
                     (0, h(xi ))   for y = 0,
    ϕi (x, yi ) =
                  (hi (x), 0)      for y = 1.
    h(xi ) is the color histogram of the pixel i.

Feature functions: pairwise terms c = {i, j}:
    ϕij (x, yi , yj ) = yi = yj .

Loss function: Hamming loss
     ∆(y, y ) = i yi = yi
How to solve                  argminy ∆(y n , y) + argmaxy F (x n , y)                      ?

          ∆(y n , y) + F (x n , y)
                          =          yin = yi +                wi h(xin ) + wij         yi = yj
                                i                          i                       ij

                          =         [wi h(xin ) + yin = yi ] + wij                  yi = yj
                                i                                             ij



        if wij ≥ 0 (which makes sense), then E := −F is submodular.
        use GraphCut algorithm to find global optimum efficiently.
                also possible: (loopy) belief propagation, variational inference,
                greedy search, simulated annealing, . . .

•   [M. Szummer, P. Kohli: "Learning CRFs using graph cuts", ECCV 2008]
Extension: Image segmentation with connectedness constraints



 Knowing that the object is connected improves segmentation quality.




                       ←                      →
        ordinary               original              connected
     segmentation                                  segmentation
Segmentation as Structured Learning:
        Given: training examples (x n , y n )n=1,...,N
        Wanted: prediction function f : X → Y where
                X = {all images (as superpixels)}
                Y = {all connected binary segmentations}




• S. Nowozin, C.L.: Global Connectivity Potentials for Random Field Models, CVPR 2009.
Feature functions: unary terms c = {i}:
                    
                    (0, h(xi ))    for y = 0,
    ϕi (x, yi ) =
                    (hi (x), 0)    for y = 1.
    h(xi ) is the bag of visual words histogram of the superpixel i.

Feature functions: pairwise terms c = {i, j}:
    ϕij (yi , yj ) = yi = yj .

Loss function: Hamming loss
     ∆(y, y ) = i yi = yi
How to solve       f (x) =         argmax          ∆(y n , y) + F (x n , y)       ?
                                {y is connected}


Linear programming relaxation with connectivity constraints
    rewrite energy such that it is linear in new variables µli and µll ,
                                                                    ij

          F (x, y) =            w1 hi (x)µ1 + w2 hi (x)µ−1 +
                                          i             i                     w3 µll
                                                                                  ij
                            i                                           l=l

     subject to

                   µli ∈ {0, 1},            µll ∈ {0, 1},
                                             ij

                            µli = 1,            µll = µli ,
                                                 ij                       µll = µlj
                                                                           ij
                        l                   l                       l

     relax to µli ∈ [0, 1] and µll ∈ [0, 1]
                                ij
     solve linear program with additional linear constraints:
     µ1 + µ1 −
      i    j            µ1 ≤ 1 for any set S of nodes separating i and j.
                         k
                  k∈S
Example Results:




             original            segmentation   with connectivity


. . . still room for improvement . . .
Summary


Machine Learning of Structured Outputs
   Task: predict f : X → Y for (almost) arbitrary Y
   Key idea:
          learn scoring function F : X × Y → R
          predict using f (x) := argmaxy F (x, y)

Structured Support Vector Machines
    Parametrize F (x, y) = w, ϕ(x, y)
    Learn w from training data by maximum-margin criterion
    Needs only:
          feature function ϕ(x, y)
          loss function ∆(y, y )
          routine to solve argmaxy ∆(y n , y) + F (x n , y)
Applications
   Many different applications in unified framework
        Natural Language Prediction: parsing
        CompBio: secondary structured prediction
        Computer Vision: pose estimation, object
        localization/segmentation
        ...


Open Problems
   Theory:
        what output structures are useful?
        (how) can we use approximate argmaxy ?
    Practice:
        more application? new domains?
        training speed!

                           Thank you!

More Related Content

What's hot (20)

PDF
Lecture Notes on Adaptive Signal Processing-1.pdf
VishalPusadkar1
 
PPTX
Windowing techniques of fir filter design
Rohan Nagpal
 
PPTX
Digital image processing
kavitha muneeshwaran
 
PDF
Error control coding bch, reed-solomon etc..
Madhumita Tamhane
 
PPTX
Genetic algorithm
Megha V
 
PPTX
Evolutionary computing - soft computing
SakshiMahto1
 
PDF
Support Vector Machines for Classification
Prakash Pimpale
 
PPT
Chapter 03 cyclic codes
Manoj Krishna Yadavalli
 
PPTX
Genetic algorithms vs Traditional algorithms
Dr. C.V. Suresh Babu
 
PPT
Precoding
Khalid Hussain
 
PPTX
Bat algorithm and applications
Md.Al-imran Roton
 
PPTX
Adaptive filter
A. Shamel
 
PDF
Artificial Neural Networks Lect3: Neural Network Learning rules
Mohammed Bennamoun
 
PDF
Characterization of the Wireless Channel
Suraj Katwal
 
PPSX
Image Processing: Spatial filters
Dr. A. B. Shinde
 
PPT
Wavelet transform in image compression
jeevithaelangovan
 
PPTX
Path Loss and Shadowing
Yash Gupta
 
PPTX
Cellular Concept
Deeptanu Datta
 
Lecture Notes on Adaptive Signal Processing-1.pdf
VishalPusadkar1
 
Windowing techniques of fir filter design
Rohan Nagpal
 
Digital image processing
kavitha muneeshwaran
 
Error control coding bch, reed-solomon etc..
Madhumita Tamhane
 
Genetic algorithm
Megha V
 
Evolutionary computing - soft computing
SakshiMahto1
 
Support Vector Machines for Classification
Prakash Pimpale
 
Chapter 03 cyclic codes
Manoj Krishna Yadavalli
 
Genetic algorithms vs Traditional algorithms
Dr. C.V. Suresh Babu
 
Precoding
Khalid Hussain
 
Bat algorithm and applications
Md.Al-imran Roton
 
Adaptive filter
A. Shamel
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Mohammed Bennamoun
 
Characterization of the Wireless Channel
Suraj Katwal
 
Image Processing: Spatial filters
Dr. A. B. Shinde
 
Wavelet transform in image compression
jeevithaelangovan
 
Path Loss and Shadowing
Yash Gupta
 
Cellular Concept
Deeptanu Datta
 

Similar to Machine learning of structured outputs (20)

PDF
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
butest
 
PDF
Huong dan cu the svm
taikhoan262
 
PDF
Guide
taikhoan262
 
PDF
04 structured support vector machine
zukun
 
PPT
Machine Learning and Statistical Analysis
butest
 
PPT
Machine Learning and Statistical Analysis
butest
 
PPT
Machine Learning and Statistical Analysis
butest
 
PPT
Machine Learning and Statistical Analysis
butest
 
PPT
Machine Learning and Statistical Analysis
butest
 
PPT
Machine Learning and Statistical Analysis
butest
 
PPT
Machine Learning and Statistical Analysis
butest
 
DOC
ASCE_ChingHuei_Rev00..
butest
 
DOC
ASCE_ChingHuei_Rev00..
butest
 
PDF
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Beniamino Murgante
 
PPTX
Support Vector Machines Simply
Emad Nabil
 
PPT
NIPS2007: structured prediction
zukun
 
PDF
Structured regression for efficient object detection
zukun
 
PDF
An introduction to Machine Learning
butest
 
PPT
Computational Biology, Part 4 Protein Coding Regions
butest
 
PDF
An introduc on to Machine Learning
butest
 
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
butest
 
Huong dan cu the svm
taikhoan262
 
04 structured support vector machine
zukun
 
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
butest
 
ASCE_ChingHuei_Rev00..
butest
 
ASCE_ChingHuei_Rev00..
butest
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Beniamino Murgante
 
Support Vector Machines Simply
Emad Nabil
 
NIPS2007: structured prediction
zukun
 
Structured regression for efficient object detection
zukun
 
An introduction to Machine Learning
butest
 
Computational Biology, Part 4 Protein Coding Regions
butest
 
An introduc on to Machine Learning
butest
 
Ad

More from zukun (20)

PDF
My lyn tutorial 2009
zukun
 
PDF
ETHZ CV2012: Tutorial openCV
zukun
 
PDF
ETHZ CV2012: Information
zukun
 
PDF
Siwei lyu: natural image statistics
zukun
 
PDF
Lecture9 camera calibration
zukun
 
PDF
Brunelli 2008: template matching techniques in computer vision
zukun
 
PDF
Modern features-part-4-evaluation
zukun
 
PDF
Modern features-part-3-software
zukun
 
PDF
Modern features-part-2-descriptors
zukun
 
PDF
Modern features-part-1-detectors
zukun
 
PDF
Modern features-part-0-intro
zukun
 
PDF
Lecture 02 internet video search
zukun
 
PDF
Lecture 01 internet video search
zukun
 
PDF
Lecture 03 internet video search
zukun
 
PDF
Icml2012 tutorial representation_learning
zukun
 
PPT
Advances in discrete energy minimisation for computer vision
zukun
 
PDF
Gephi tutorial: quick start
zukun
 
PDF
EM algorithm and its application in probabilistic latent semantic analysis
zukun
 
PDF
Object recognition with pictorial structures
zukun
 
PDF
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 
My lyn tutorial 2009
zukun
 
ETHZ CV2012: Tutorial openCV
zukun
 
ETHZ CV2012: Information
zukun
 
Siwei lyu: natural image statistics
zukun
 
Lecture9 camera calibration
zukun
 
Brunelli 2008: template matching techniques in computer vision
zukun
 
Modern features-part-4-evaluation
zukun
 
Modern features-part-3-software
zukun
 
Modern features-part-2-descriptors
zukun
 
Modern features-part-1-detectors
zukun
 
Modern features-part-0-intro
zukun
 
Lecture 02 internet video search
zukun
 
Lecture 01 internet video search
zukun
 
Lecture 03 internet video search
zukun
 
Icml2012 tutorial representation_learning
zukun
 
Advances in discrete energy minimisation for computer vision
zukun
 
Gephi tutorial: quick start
zukun
 
EM algorithm and its application in probabilistic latent semantic analysis
zukun
 
Object recognition with pictorial structures
zukun
 
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 
Ad

Recently uploaded (20)

PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Top Managed Service Providers in Los Angeles
Captain IT
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 

Machine learning of structured outputs

  • 1. Machine Learning of Structured Outputs Christoph Lampert IST Austria (Institute of Science and Technology Austria) Klosterneuburg Feb 2, 2011
  • 2. Machine Learning of Structured Outputs Overview... Introduction to Structured Learning Structured Support Vector Machines Applications in Computer Vision Slides available at http://www.ist.ac.at/~chl
  • 3. What is Machine Learning? Definition [T. Mitchell]: Machine Learning is the study of computer algorithms that improve their performance in a certain task through experience. Example: Backgammon Task: play backgammon Experience: self-play Performance measure: games won against humans Example: Object Recognition Task: determine which objects are visible in images Experience: annotated training data Performance measure: object recognized correctly
  • 4. What is structured data? Definition [ad hoc]: Data is structured if it consists of several parts, and not only the parts themselves contain information, but also the way in which the parts belong together. Text Molecules / Chemical Structures Documents/HyperText Images
  • 5. The right tool for the problem. Example: Machine Learning for/of Structured Data image body model model fit Task: human pose estimation Experience: images with manually annotated body pose Performance measure: number of correctly localized body parts
  • 6. Other tasks: Natural Language Processing: Automatic Translation (output: sentences) Sentence Parsing (output: parse trees) Bioinformatics: RNA Structure Prediction (output: bipartite graphs) Enzyme Function Prediction (output: path in a tree) Speech Processing: Automatic Transcription (output: sentences) Text-to-Speech (output: audio signal) Robotics: Planning (output: sequence of actions) This talk: only Computer Vision examples
  • 7. "Normal" Machine Learning: f : X → R. inputs X can be any kind of objects images, text, audio, sequence of amino acids, . . . output y is a real number classification, regression, . . . many way to construct f : f (x) = a · ϕ(x) + b, f (x) = decision tree, f (x) = neural network
  • 8. Structured Output Learning: f : X → Y. inputs X can be any kind of objects outputs y ∈ Y are complex (structured) objects images, parse trees, folds of a protein, . . . how to construct f ?
  • 9. Predicting Structured Outputs: Image Denosing f: → input: images output: denoised images input set X = {grayscale images} = [0, 255]M ·N ˆ output set Y = {grayscale images} = [0, 255]M ·N ˆ energy minimization f (x) := argminy∈Y E(x, y) E(x, y) = λ i (xi − yi )2 + µ i,j |yi − yj |
  • 10. Predicting Structured Outputs: Human Pose Estimation → input: image body model output: model fit input set X = {images} output set Y = {positions/angles of K body parts} = R4K . ˆ energy minimization f (x) := argminy∈Y E(x, y) E(x, y) = i wi ϕfit (xi , yi ) + i,j wij ϕpose (yi , yj )
  • 11. Predicting Structured Outputs: Shape Matching input: image pairs output: mapping y : xi ↔ y(xi ) scoring function F (x, y) = i wi ϕsim (xi , y(xi )) + i,j wij ϕdist (xi , xj , y(xi ), y(xj )) predict f : X → Y by f (x) := argmaxy∈Y F (x, y) [J. McAuley et al.: "Robust Near-Isometric Matching via Structured Learning of Graphical Models", NIPS, 2008]
  • 12. Predicting Structured Outputs: Tracking (by Detection) input: output: image object position input set X = {images} output set Y = R2 (box center) or R4 (box coordinates) predict f : X → Y by f (x) := argmaxy∈Y F (x, y) scoring function F (x, y) = w ϕ(x, y) e.g. SVM score images: [C. L., Jan Peters, "Active Structured Learning for High-Speed Object Detection", DAGM 2009]
  • 13. Predicting Structured Outputs: Summary Image Denoising y = argminy E(x, y) E(x, y) = w1 i (xi − yi )2 + w2 i,j |yi − yj | Pose Estimation y = argminy E(x, y) E(x, y) = i wi ϕ(xi , yi ) + i,j wij ϕ(yi , yj ) Point Matching y = argmaxy F (x, y) F (x, y) = i wi ϕ(xi , yi ) + i,j wij ϕ(yi , yj ) Tracking y = argmaxy F (x, y) F (x, y) = w ϕ(x, y)
  • 14. Unified Formulation Predict structured output by maximization y = argmax F (x, y) y∈Y of a compatiblity function F (x, y) = w, ϕ(x, y) that is linear in a parameter vector w.
  • 15. Structured Prediction: how to evaluate argmaxy F (x, y)? chain tree loop-free graphs: Shortest-Path / Belief Propagation (BP) grid arbitrary graph loopy graphs: GraphCut, approximate inference (e.g. loopy BP) Structured Learning: how to learn F (x, y) from examples?
  • 16. Machine Learning for Structured Outputs Learning Problem: Task: predict structured objects f : X → Y Experience: example pairs {(x 1 , y 1 ), . . . , (x N , y N )} ⊂ X × Y: typical inputs with "correct" outputs for them. { , , ,. . . } Performance measure: ∆ : Y × Y → R Our choice: parametric family: F (x, y; w) = w, ϕ(x, y) prediction method: f (x) = argmaxy∈Y F (x, y; w) Task: determine "good" w
  • 17. Reminder: regularized risk minimization Find w for decision function F = w, ϕ(x, y) by N 2 minw∈Rd λ w + (y n , F (x n , ·; w)) n=1 Regularization + empirical loss (on training data) Logistic Loss: Conditional Random Fields (y n , F (x n , ·; w)) = log exp[F (x n , y; w) − F (x n , y n ; w)] y∈Y Hinge-loss: Maximum Margin Training (y n , F (x n , ·; w)) = max ∆(y n , y)+F (x n , y; w)−F (x n , y n ; w) y∈Y Exponential Loss: Boosting (y n , F (x n , ·; w)) = exp[F (x n , y; w) − F (x n , y n ; w)] y∈Y{y n }
  • 18. Maximum Margin Training of Structured Models (Structured SVMs)
  • 19. Structured Support Vector Machine Structured Support Vector Machine: 1 2 minw∈Rd w 2 N C + max ∆(y n , y) + w, ϕ(x n , y) − w, ϕ(x n , y n ) N n=1 y∈Y Unconstrained optimization, convex, non-differentiable objective. [I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun. "Large Margin Methods for Structured and Interdependent Output Variables", JMLR, 2005.]
  • 20. S-SVM Objective Function for w ∈ R2 : S-SVM objective C =0.01 S-SVM objective C =0.10 3 3 2 2 1 1 0 0 1 1 2 2 3 2 1 0 1 2 3 4 5 3 2 1 0 1 2 3 4 5 S-SVM objective C =1.00 S-SVM objective C→ ∞ 3 3 2 2 1 1 0 0 1 1 2 2 3 2 1 0 1 2 3 4 5 3 2 1 0 1 2 3 4 5
  • 21. Structured Support Vector Machine: 1 2 minw∈Rd w 2 N C + max ∆(y n , y) + w, ϕ(x n , y) − w, ϕ(x n , y n ) N n=1 y∈Y Unconstrained optimization, convex, non-differentiable objective.
  • 22. Structured SVM (equivalent formulation): N 1 2 C minw∈Rd ,ξ∈Rn w + ξn + 2 N n=1 subject to, for n = 1, . . . , N , max ∆(y n , y) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξn y∈Y n non-linear contraints, convex, differentiable objective.
  • 23. Structured SVM (also equivalent formulation): N 1 2 C minw∈Rd ,ξ∈Rn w + ξn + 2 N n=1 subject to, for n = 1, . . . , N , ∆(y n , y) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n , for all y ∈ Y |Y|n linear constraints, convex, differentiable objective.
  • 24. Example: A "True" Multiclass SVM  1 for y = y Y = {1, 2, . . . , K }, ∆(y, y ) = . 0 otherwise. ϕ(x, y) = y = 1 Φ(x), y = 2 Φ(x), . . . , y = K Φ(x) = Φ(x)ey with ey =y-th unit vector Solve: N 1 2 C minw,ξ w + ξn 2 N n=1 subject to, for n = 1, . . . , N , w, ϕ(x n , y n ) − w, ϕ(x n , y) ≥ 1 − ξ n for all y ∈ Y. Classification: MAP f (x) = argmax w, ϕ(x, y) y∈Y Crammer-Singer Multiclass SVM
  • 25. Hierarchical Multiclass Classification Loss function can reflect hierarchy: cat dog car bus 1 ∆(y, y ) := (distance in tree) 2 ∆(cat, cat) = 0, ∆(cat, dog) = 1, ∆(cat, bus) = 2, etc. Solve: N 1 2 C minw,ξ w + ξn 2 N n=1 subject to, for n = 1, . . . , N , w, ϕ(x n , y n ) − w, ϕ(x n , y) ≥ ∆(y n , y) − ξ n for all y ∈ Y.
  • 26. Kernelized S-SVM problem: Define joint kernel function k : (X × Y) × (X × Y) → R, kernel matrix Knn yy = k( (x n , y), (x n , y ) ). 1 max αny ∆(y n , y) − αny αn y Knn yy n|Y| α∈R+ n=1,...,N 2 y,y ∈Y y∈Y n,n =1,...,N subject to, for n = 1, . . . , N , C αny ≤ . y∈Y N Kernelized prediction function: f (x) = argmax αny k( (x n , y n ), (x, y) ) y∈Y ny Too many variables: train with working set of αny .
  • 28. Example 1: Category-Level Object Localization What objects are present? person, car
  • 29. Example 1: Category-Level Object Localization Where are the objects?
  • 30. Object Localization ⇒ Scene Interpretation A man inside of a car A man outside of a car ⇒ He’s driving. ⇒ He’s passing by.
  • 31. Object Localization as Structured Learning: Given: training examples (x n , y n )n=1,...,N Wanted: prediction function f : X → Y where X = {all images} Y = {all boxes}   fcar   =
  • 32. Structured SVM framework Define: feature function ϕ : X × Y → Rd , loss function ∆ : Y × Y → R, routine to solve argmaxy ∆(y n , y) + w, ϕ(x n , y n ) . 1 2 N Solve: minw,ξ 2 w +C n=1 ξN subject to ∀y ∈ Y : ∆(y, y n ) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n , Result: w ∗ that determines scoring function F (x, y) = w ∗ , ϕ(x, y) , localization function: f (x) = argmaxy F (x, y). • M. Blaschko, C.L.: Learning to Localize Objects with Structured Output Regression, ECCV 2008.
  • 33. Feature function: how to represents a (image,box)-pair (x, y)? Obs: whether y is the right box for x, depends only on x|y . ϕ(x, y) := h(x|y ) where h(r) is a (bag of visual word) histogram representation of the region r. ϕ = h( ) ≈ h( )=ϕ ϕ = h( ) ≈ h( )=ϕ ϕ = h( ) ≈ h( )=ϕ ...
  • 34. Structured SVM framework Define: feature function ϕ : X × Y → Rd , loss function ∆ : Y × Y → R, routine to solve argmaxy ∆(y n , y) + w, ϕ(x n , y n ) . 1 2 N Solve: minw,ξ 2 w +C n=1 ξN subject to ∀y ∈ Y : ∆(y, y n ) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n , Result: w ∗ that determines scoring function F (x, y) = w ∗ , ϕ(x, y) , localization function: f (x) = argmaxy F (x, y). • M. Blaschko, C.L.: Learning to Localize Objects with Structured Output Regression, ECCV 2008.
  • 35. Loss function: how to compare two boxes y and y ? ∆(y, y ) := 1 − area overlap between y and y area(y ∩ y ) =1− area(y ∪ y )
  • 36. Structured SVM framework Define: feature function ϕ : X × Y → Rd , loss function ∆ : Y × Y → R, routine to solve argmaxy ∆(y n , y) + w, ϕ(x n , y n ) . 1 2 N Solve: minw,ξ 2 w +C n=1 ξN subject to ∀y ∈ Y : ∆(y, y n ) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n , Result: w ∗ that determines scoring function F (x, y) = w ∗ , ϕ(x, y) , localization function: f (x) = argmaxy F (x, y). • M. Blaschko, C.L.: Learning to Localize Objects with Structured Output Regression, ECCV 2008.
  • 37. How to solve f (x) = argmaxy ∆(y n , y) + w, ϕ(x n , y) ? Option 1) Sliding Window 1 − 0.3 = 0.7 1 − 0.8 = 0.2 1 − 0.1 = 0.9 1 − 0.2 = 0.8 ... 0.3 + 1.4 = 1.7 0 + 1.5 = 1.5 ... 1 − 1.2 = −0.2 1 − 0.3 = 0.7 Option 2) Branch-and-Bound Search (another talk) • C.L., M. Blaschko, T. Hofmann: Beyond Sliding Windows: Object Localization by Efficient Subwindow Search, CVPR 2008.
  • 38. Structured Support Vector Machine N 1 2 S-SVM Optimization: minw,ξ 2 w +C ξn n=1 subject to for n = 1, . . . , N : ∀y ∈ Y : ∆(y, y n ) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n , Solve via constraint generation: Iterate: Solve minimization with working set of contraints: new w Identify argmaxy∈Y ∆(y, y n ) + w, ϕ(x n , y) Add violated constraints to working set and iterate Polynomial time convergence to any precision ε
  • 39. Example: Training set (x1 , z1 ), . . . , (x4 , y4 )
  • 40. Initialize: no constraints Solve minimization with working set of contraints ⇒ w=0 Identify argmaxy∈Y ∆(y, y n ) + w, ϕ(x n , y) w, ϕ(x n , y) = 0 → pick any window with ∆(y, y n ) = 1 Add violated constraints to working set and iterate w, − w, ≥ 1, w, − w, ≥ 1, w, − w, ≥ 1, w, − w, ≥ 1.
  • 41. Working set of constraints: w, − w, ≥ 1, w, − w, ≥ 1, w, − w, ≥ 1, w, − w, ≥ 1. Solve minimization with working set of contraints Identify argmaxy∈Y ∆(y, y n ) + w, ϕ(x n , y) Add violated constraints to working set and iterate w, − w, ≥ 1, w, − w, ≥ 0.9, w, − w, ≥ 0.8, w, − w, ≥ 0.01.
  • 42. Working set of constraints: w, − w, ≥ 1, w, − w, ≥1 w, − w, ≥ 1, w, − w, ≥ 0.9, w, − w, ≥ 1, w, − w, ≥ 0.8, w, − w, ≥ 1, w, − w, ≥ 0.01. Solve minimization with working set of contraints Identify argmaxy∈Y ∆(y, y n ) + w, ϕ(x n , y) Add violated constraints to working set and iterate,. . .
  • 43. N 1 2 S-SVM Optimization: minw,ξ 2 w +C ξn n=1 subject to for n = 1, . . . , N : ∀y ∈ Y : ∆(y, y n ) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n , Solve via constraint generation: Iterate: Solve minimization with working set of contraints Identify argmaxy∈Y ∆(y, y n ) + w, ϕ(x n , y) Add violated constraints to working set and iterate Similar to classical bootstrap training, but: force margin between correct and incorrect location scores, handle overlapping detections by fractional scores.
  • 44. Results: PASCAL VOC 2006 Example detections for VOC 2006 bicycle, bus and cat. Precision–recall curves for VOC 2006 bicycle, bus and cat. Structured training improves detection accuracy.
  • 45. More Recent Results (PASCAL VOC 2009) aeroplane
  • 46. More Recent Results (PASCAL VOC 2009) horse
  • 47. More Recent Results (PASCAL VOC 2009) sheep
  • 48. More Recent Results (PASCAL VOC 2009) sofa
  • 49. Why does it work? Learned weights from binary (center) and structured training (right). Both training methods: positive weights at object region. Structured training: negative weights for features just outside the bounding box position. Posterior distribution over box coordinates becomes more peaked.
  • 50. Example II: Category-Level Object Segmentation Where exactly are the objects?
  • 51. Segmentation as Structured Learning: Given: training examples (x n , y n )n=1,...,N { , , , , } Wanted: prediction function f : X → Y with X = {all images} Y = {all binary segmentations}
  • 52. Structured SVM framework Define: Feature functions: ϕ(x, y) → Rd unary terms ϕi (x, yi ) for each pixel i pairwise terms ϕij (x, yi , yj ) for neighbors (i, j) Loss function ∆ : Y × Y → R. ideally decomposes like ϕ 1 2 N Solve: minw,ξ 2 w +C n=1 ξn subject to ∀y ∈ Y : ∆(y, y n ) + w, ϕ(x n , y) − w, ϕ(x n , y n ) ≤ ξ n , Result: w ∗ that determines scoring function F (x, y) = w ∗ , ϕ(x, y) , segmentation function: f (x) = argmaxy F (x, y).
  • 53. Example choices: Feature functions: unary terms c = {i}:   (0, h(xi )) for y = 0, ϕi (x, yi ) =  (hi (x), 0) for y = 1. h(xi ) is the color histogram of the pixel i. Feature functions: pairwise terms c = {i, j}: ϕij (x, yi , yj ) = yi = yj . Loss function: Hamming loss ∆(y, y ) = i yi = yi
  • 54. How to solve argminy ∆(y n , y) + argmaxy F (x n , y) ? ∆(y n , y) + F (x n , y) = yin = yi + wi h(xin ) + wij yi = yj i i ij = [wi h(xin ) + yin = yi ] + wij yi = yj i ij if wij ≥ 0 (which makes sense), then E := −F is submodular. use GraphCut algorithm to find global optimum efficiently. also possible: (loopy) belief propagation, variational inference, greedy search, simulated annealing, . . . • [M. Szummer, P. Kohli: "Learning CRFs using graph cuts", ECCV 2008]
  • 55. Extension: Image segmentation with connectedness constraints Knowing that the object is connected improves segmentation quality. ← → ordinary original connected segmentation segmentation
  • 56. Segmentation as Structured Learning: Given: training examples (x n , y n )n=1,...,N Wanted: prediction function f : X → Y where X = {all images (as superpixels)} Y = {all connected binary segmentations} • S. Nowozin, C.L.: Global Connectivity Potentials for Random Field Models, CVPR 2009.
  • 57. Feature functions: unary terms c = {i}:  (0, h(xi )) for y = 0, ϕi (x, yi ) = (hi (x), 0) for y = 1. h(xi ) is the bag of visual words histogram of the superpixel i. Feature functions: pairwise terms c = {i, j}: ϕij (yi , yj ) = yi = yj . Loss function: Hamming loss ∆(y, y ) = i yi = yi
  • 58. How to solve f (x) = argmax ∆(y n , y) + F (x n , y) ? {y is connected} Linear programming relaxation with connectivity constraints rewrite energy such that it is linear in new variables µli and µll , ij F (x, y) = w1 hi (x)µ1 + w2 hi (x)µ−1 + i i w3 µll ij i l=l subject to µli ∈ {0, 1}, µll ∈ {0, 1}, ij µli = 1, µll = µli , ij µll = µlj ij l l l relax to µli ∈ [0, 1] and µll ∈ [0, 1] ij solve linear program with additional linear constraints: µ1 + µ1 − i j µ1 ≤ 1 for any set S of nodes separating i and j. k k∈S
  • 59. Example Results: original segmentation with connectivity . . . still room for improvement . . .
  • 60. Summary Machine Learning of Structured Outputs Task: predict f : X → Y for (almost) arbitrary Y Key idea: learn scoring function F : X × Y → R predict using f (x) := argmaxy F (x, y) Structured Support Vector Machines Parametrize F (x, y) = w, ϕ(x, y) Learn w from training data by maximum-margin criterion Needs only: feature function ϕ(x, y) loss function ∆(y, y ) routine to solve argmaxy ∆(y n , y) + F (x n , y)
  • 61. Applications Many different applications in unified framework Natural Language Prediction: parsing CompBio: secondary structured prediction Computer Vision: pose estimation, object localization/segmentation ... Open Problems Theory: what output structures are useful? (how) can we use approximate argmaxy ? Practice: more application? new domains? training speed! Thank you!