Optimization

Topics in Optimization
2019 Copyright QuantUniversity LLC.
Presented By:
Anish Shah, CFA
Optimization Lead
QuantUniversity

• Anish leads Optimization workshops at
QuantUniversity.
• Anish has worked as an investment quant at the
alternative data hedge fund Cargometrics,
Northfield, and ITG.
• He holds an MS in Applied Math from Brown
University and an MS in Operations Research from
the University of California, Berkeley.
Anish Shah
Optimization Lead
2

Once you work on data, optimization is as fundamental as linear
algebra and appears everywhere, if not explicitly, then under the
hood
Inference typically happens by maximizing a likelihood or probability
OLS regression (solved analytically)
median regression
GARCH
fitting parameters to match the data, e.g. in a Kalman Filter
Why should a quant care? Inference.

Model something, add optimization, and create a “robot”, a tool that
does the task automatically and better
Google Maps models roads and finds short paths
Expedia models flights and finds inexpensive trips
Portfolio optimization models security returns’ movement and maximizes risk-
adjusted return
Algorithmic trading models how stock prices move in response to orders and
executes trades at low cost and risk
Opportunity to create and add tremendous value
Why care? Build robots.

Optimization puts the learning in machine learning
‘Training’ = minimizing a loss function
In classification, e.g. cross-entropy = – log[p(correct label)]
In numeric, e.g. squared error = (true value – prediction)2
To train a network to identify cats
Using many photos labeled cat or not, optimize its weights to minimize
avg cross-entropy of [network’s predicted probability, true label]
Why care? Machine Learning.

In real-world setups, no guarantee that the optimization is solvable
However, particular well-studied classes have good algorithms and
assurances of finding a best solution
Knowing lets you restate or approximate yours as one
Even if you can’t, knowing how things can fail lets you proceed
skillfully
Why care? Knowledge improves outcomes.

x : variables that you can play with
e.g. in portfolio optimization, one entry could be % invested in AMZN
in finding the shortest route, it could be 1 if a road is used, 0 if not
f(x) : ‘objective function’ - criterion you want to maximize or
minimize
e.g. portfolio’s forecasted risk-penalized return – transaction costs
expected time from point A to B – a penalty on variability
h(x) ≤ b : constraints
e.g. no more than 20% of portfolio in tech stocks
at every node (excluding start and finish), sum roads entering = sum exits
General setup

Goal: maximize expected return with a penalty on variance and costs
x : vector of security weights to be found x0 : vector of initial security weights
r : forecast return Σ : forecast covariance
λ : risk aversion ɣ : cost aversion
A, b : define linear constraints
maxx rT x – λ xT Σ x – ɣ cost(x – x0)
s.t. A x ≤ b
Example: (vanilla) portfolio optimization

Convex optimization - minimizing a bowl shaped function under
constraints that, if met by any two points, are also met by all the
points on the line between them
Linear programming - linear objective, linear constraints
Quadratic programming - quadratic objective, linear constraints
SOCP - linear objective, quadratic constraints
All other convex solved by general algorithms
* Note: ‘programming’ just means optimization
Structures that make life easy

Linear constraints
A x ≤ b
Linear objective function in linear programming
min cT x
Quadratic objective function in quadratic programming
min ½ xT Q x + cT x
Linear and quadratic programming

Assorted graph problems - shortest path, maximum flow, …
Algorithm is specific to the problem
Dynamic programming - a general principle used when the problem
separates into contingent subproblems
e.g. the shortest way from NY → LA can be broken into finding
1) the shortest way from NY → every bridge that crosses the Mississippi
2) the shortest way from every bridge that crosses the Mississippi → LA
Encounter things of this flavor in reinforcement learning
EM (expectation maximization) - solvable in alternating steps, very
particular
Structures that make life easy (2)

13
Non-convex objective functions – many extrema
Non-convex constraints – path connecting acceptable points violates
constraints
Integer constraints or variables
e.g. in portfolio optimization, trading securities in fixed sizes (round lots
of 100 shares, whole futures contracts)
Solved by a combination of heuristics and brute force
Structures that make life hard

Relaxation - sequentially approximate as an easier problem and solve
Iterate - from wherever you are, move in a direction that meets
constraints and improves the objective, perhaps leaving opportunity
to jump around (Metropolis) to avoid stalling at a local extremum
What do you outside the easy cases?

Imagine fitting a machine learning model, with N observations of
labeled data
zi : data for ith observation
yi : label for ith observation
x : parameters (to be fit) of the model
g(x, zi) : model’s output for observation i
Given some loss function f, the goal is
Minimizex ∑i=1..N loss(labeli, model outputi) = ∑i=1..N f(yi, g(x, zi) )
Following the gradient

Want to minimizex ∑i=1..N f(yi, g(x, zi) )
Start with an initial value of x and step to improve the objective
Why not the direction of the gradient?
x ← x – a ∑i=1..N ∇f(yi, g(x, zi))
Called ‘steepest descent’ in the optimization literature
Was regarded as a bad idea, slowed by differences in scale
Better is inverse of curvature (2nd derivatives) x gradient
Following the gradient (2)

Want to minimizex ∑i=1..N f(yi, g(x, zi) )
refine x using one or a few sampled points rather than all the data
x ← x – a ∇f(yk, g(x, zk))
k : the random sample
a : the learning rate which is decayed over time
Much lower computational cost
Works as well, except for final refinement
Steepest descent reemerges as
stochastic gradient descent (SGD) in deep learning

Stops at ‘local maximum’ if the function maximized is not a mountain
peak but a mountain range
Proceeds slowly and costly (AWS $) because of a poorly tuned or
inappropriate algorithm
Chases noise or overfits yielding poor out-of-sample results
How can optimization go wrong?

Suppose the goal is minx f(x, y) where y is noisy data
Robust optimization - finds the best worst case given a range of inputs
minx maxy f(x, y) where y, rather than being fixed, takes value in a range
hard to solve except for special cases
Regularization - penalizes the norm of the decision variable
minx f(x, y) + λ |x|
Ridge regression, lasso penalty, ...
Tightly connected to Bayesian statistics
Addressing noise and overfitting

Portfolio optimization
(Anish Shah 2015) Uncertain covariance and (patent-pending)
uncertainty-penalized utility
Maximize E[utility] – ɣ stdev[utility]
Deep learning - ‘dropout’
At each training optimization step, randomly omit connections so
information isn’t overfit and gets spread throughout the network
More on regularization

21
• cvxopt – convex optimizer library
• nlopt – nonlinear optimizer library
• CBC – mixed integer programming library
• Google OR Tools – lp and assorted graph algorithms
• CPLEX – commercial optimizer for the hardest problems
Optimization tools

22
• Portfolio optimization
• Algorithmic trading
• Index/ETF construction
• Hedging
• Asset/liability matching
• Nowcasting covariance
Applications in Finance

Optimization is everywhere, especially as ML becomes pervasive
This is the direction quant is headed
Knowledge lets you create and add more value
The optimization piece of ML doesn’t seem fully matured – there are
puzzles to be solved!
Summary

Optimization

More Related Content

What's hot (20)

Similar to Optimization (20)

More from QuantUniversity (20)

Recently uploaded (20)

Optimization