Topics in Optimization
2019 Copyright QuantUniversity LLC.
Presented By:
Anish Shah, CFA
Optimization Lead
QuantUniversity
• Anish leads Optimization workshops at
QuantUniversity.
• Anish has worked as an investment quant at the
alternative data hedge fund Cargometrics,
Northfield, and ITG.
• He holds an MS in Applied Math from Brown
University and an MS in Operations Research from
the University of California, Berkeley.
Anish Shah
Optimization Lead
2
3
Once you work on data, optimization is as fundamental as linear
algebra and appears everywhere, if not explicitly, then under the
hood
Inference typically happens by maximizing a likelihood or probability
OLS regression (solved analytically)
median regression
GARCH
fitting parameters to match the data, e.g. in a Kalman Filter
Why should a quant care? Inference.
Model something, add optimization, and create a “robot”, a tool that
does the task automatically and better
Google Maps models roads and finds short paths
Expedia models flights and finds inexpensive trips
Portfolio optimization models security returns’ movement and maximizes risk-
adjusted return
Algorithmic trading models how stock prices move in response to orders and
executes trades at low cost and risk
Opportunity to create and add tremendous value
Why care? Build robots.
Optimization puts the learning in machine learning
‘Training’ = minimizing a loss function
In classification, e.g. cross-entropy = – log[p(correct label)]
In numeric, e.g. squared error = (true value – prediction)2
To train a network to identify cats
Using many photos labeled cat or not, optimize its weights to minimize
avg cross-entropy of [network’s predicted probability, true label]
Why care? Machine Learning.
In real-world setups, no guarantee that the optimization is solvable
However, particular well-studied classes have good algorithms and
assurances of finding a best solution
Knowing lets you restate or approximate yours as one
Even if you can’t, knowing how things can fail lets you proceed
skillfully
Why care? Knowledge improves outcomes.
x : variables that you can play with
e.g. in portfolio optimization, one entry could be % invested in AMZN
in finding the shortest route, it could be 1 if a road is used, 0 if not
f(x) : ‘objective function’ - criterion you want to maximize or
minimize
e.g. portfolio’s forecasted risk-penalized return – transaction costs
expected time from point A to B – a penalty on variability
h(x) ≤ b : constraints
e.g. no more than 20% of portfolio in tech stocks
at every node (excluding start and finish), sum roads entering = sum exits
General setup
Goal: maximize expected return with a penalty on variance and costs
x : vector of security weights to be found x0 : vector of initial security weights
r : forecast return Σ : forecast covariance
λ : risk aversion ɣ : cost aversion
A, b : define linear constraints
maxx rT x – λ xT Σ x – ɣ cost(x – x0)
s.t. A x ≤ b
Example: (vanilla) portfolio optimization
Convex optimization - minimizing a bowl shaped function under
constraints that, if met by any two points, are also met by all the
points on the line between them
Linear programming - linear objective, linear constraints
Quadratic programming - quadratic objective, linear constraints
SOCP - linear objective, quadratic constraints
All other convex solved by general algorithms
* Note: ‘programming’ just means optimization
Structures that make life easy
Linear constraints
A x ≤ b
Linear objective function in linear programming
min cT x
Quadratic objective function in quadratic programming
min ½ xT Q x + cT x
Linear and quadratic programming
Assorted graph problems - shortest path, maximum flow, …
Algorithm is specific to the problem
Dynamic programming - a general principle used when the problem
separates into contingent subproblems
e.g. the shortest way from NY → LA can be broken into finding
1) the shortest way from NY → every bridge that crosses the Mississippi
2) the shortest way from every bridge that crosses the Mississippi → LA
Encounter things of this flavor in reinforcement learning
EM (expectation maximization) - solvable in alternating steps, very
particular
Structures that make life easy (2)
13
Non-convex objective functions – many extrema
Non-convex constraints – path connecting acceptable points violates
constraints
Integer constraints or variables
e.g. in portfolio optimization, trading securities in fixed sizes (round lots
of 100 shares, whole futures contracts)
Solved by a combination of heuristics and brute force
Structures that make life hard
Relaxation - sequentially approximate as an easier problem and solve
Iterate - from wherever you are, move in a direction that meets
constraints and improves the objective, perhaps leaving opportunity
to jump around (Metropolis) to avoid stalling at a local extremum
What do you outside the easy cases?
Imagine fitting a machine learning model, with N observations of
labeled data
zi : data for ith observation
yi : label for ith observation
x : parameters (to be fit) of the model
g(x, zi) : model’s output for observation i
Given some loss function f, the goal is
Minimizex ∑i=1..N loss(labeli, model outputi) = ∑i=1..N f(yi, g(x, zi) )
Following the gradient
Want to minimizex ∑i=1..N f(yi, g(x, zi) )
Start with an initial value of x and step to improve the objective
Why not the direction of the gradient?
x ← x – a ∑i=1..N ∇f(yi, g(x, zi))
Called ‘steepest descent’ in the optimization literature
Was regarded as a bad idea, slowed by differences in scale
Better is inverse of curvature (2nd derivatives) x gradient
Following the gradient (2)
Want to minimizex ∑i=1..N f(yi, g(x, zi) )
refine x using one or a few sampled points rather than all the data
x ← x – a ∇f(yk, g(x, zk))
k : the random sample
a : the learning rate which is decayed over time
Much lower computational cost
Works as well, except for final refinement
Steepest descent reemerges as
stochastic gradient descent (SGD) in deep learning
Stops at ‘local maximum’ if the function maximized is not a mountain
peak but a mountain range
Proceeds slowly and costly (AWS $) because of a poorly tuned or
inappropriate algorithm
Chases noise or overfits yielding poor out-of-sample results
How can optimization go wrong?
Suppose the goal is minx f(x, y) where y is noisy data
Robust optimization - finds the best worst case given a range of inputs
minx maxy f(x, y) where y, rather than being fixed, takes value in a range
hard to solve except for special cases
Regularization - penalizes the norm of the decision variable
minx f(x, y) + λ |x|
Ridge regression, lasso penalty, ...
Tightly connected to Bayesian statistics
Addressing noise and overfitting
Portfolio optimization
(Anish Shah 2015) Uncertain covariance and (patent-pending)
uncertainty-penalized utility
Maximize E[utility] – ɣ stdev[utility]
Deep learning - ‘dropout’
At each training optimization step, randomly omit connections so
information isn’t overfit and gets spread throughout the network
More on regularization
21
• cvxopt – convex optimizer library
• nlopt – nonlinear optimizer library
• CBC – mixed integer programming library
• Google OR Tools – lp and assorted graph algorithms
• CPLEX – commercial optimizer for the hardest problems
Optimization tools
22
• Portfolio optimization
• Algorithmic trading
• Index/ETF construction
• Hedging
• Asset/liability matching
• Nowcasting covariance
Applications in Finance
Optimization is everywhere, especially as ML becomes pervasive
This is the direction quant is headed
Knowledge lets you create and add more value
The optimization piece of ML doesn’t seem fully matured – there are
puzzles to be solved!
Summary

More Related Content

PDF
Anomaly detection
PDF
Machine learning meetup
PDF
Anomaly detection Meetup Slides
PDF
Anomaly detection : QuantUniversity Workshop
PDF
Outlier analysis for Temporal Datasets
PDF
Scaling Analytics with Apache Spark
PDF
Credit risk meetup
PDF
Machine Learning Applications in Credit Risk
Anomaly detection
Machine learning meetup
Anomaly detection Meetup Slides
Anomaly detection : QuantUniversity Workshop
Outlier analysis for Temporal Datasets
Scaling Analytics with Apache Spark
Credit risk meetup
Machine Learning Applications in Credit Risk

What's hot (20)

PDF
Time series analysis : Refresher and Innovations
PDF
Anomaly detection
PDF
Ds for finance day 2
PDF
Ds for finance day 3
PPTX
Data mining Part 1
PDF
Synthetic VIX Data Generation Using ML Techniques
PDF
ML master class
PDF
Machine Learning Interpretability
PDF
Explainable AI Workshop
PDF
Constructing Private Asset Benchmarks
PPTX
Anomaly Detection
PDF
Machine Learning for Finance Master Class
PPTX
Machine learning algorithms and business use cases
PDF
model simulating
 
PPTX
Credit card fraud detection using python machine learning
PPTX
Supervised learning
PDF
L2. Evaluating Machine Learning Algorithms I
PPTX
Machine learning algorithms
PPTX
Machine Learning and Real-World Applications
PPTX
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Time series analysis : Refresher and Innovations
Anomaly detection
Ds for finance day 2
Ds for finance day 3
Data mining Part 1
Synthetic VIX Data Generation Using ML Techniques
ML master class
Machine Learning Interpretability
Explainable AI Workshop
Constructing Private Asset Benchmarks
Anomaly Detection
Machine Learning for Finance Master Class
Machine learning algorithms and business use cases
model simulating
 
Credit card fraud detection using python machine learning
Supervised learning
L2. Evaluating Machine Learning Algorithms I
Machine learning algorithms
Machine Learning and Real-World Applications
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Ad

Similar to Optimization (20)

PPTX
Mathematical Optimisation - Fundamentals and Applications
DOCX
Solving Optimization Problems using the Matlab Optimization.docx
PDF
Bertimas
PDF
Reading Materials for Operational Research
PPTX
1.1optimization concepts in engineering.pptx
PDF
1.1optimization.pdf;;;khgggggggggggghhjj
PDF
PDF
PROCESS OPTIMIZATION (CHEN 421) LECTURE 1.pdf
PPTX
content beyond syllabus os -2.notes regarding
PPTX
Techniques in Deep Learning
PPTX
Basic presentation of convex optimization 2018.pptx
PDF
bv_cvxslides (1).pdf
PDF
Probabilistic machine learning for optimization and solving complex
PDF
Overview on Optimization algorithms in Deep Learning
PPT
Lecture 1
PPTX
Introduction to mathematical optimization
PDF
Methods of Optimization in Machine Learning
PDF
Deep learning concepts
PPTX
MACHINE LEARNING YEAR DL SECOND PART.pptx
PPTX
DeepLearningLecture.pptx
Mathematical Optimisation - Fundamentals and Applications
Solving Optimization Problems using the Matlab Optimization.docx
Bertimas
Reading Materials for Operational Research
1.1optimization concepts in engineering.pptx
1.1optimization.pdf;;;khgggggggggggghhjj
PROCESS OPTIMIZATION (CHEN 421) LECTURE 1.pdf
content beyond syllabus os -2.notes regarding
Techniques in Deep Learning
Basic presentation of convex optimization 2018.pptx
bv_cvxslides (1).pdf
Probabilistic machine learning for optimization and solving complex
Overview on Optimization algorithms in Deep Learning
Lecture 1
Introduction to mathematical optimization
Methods of Optimization in Machine Learning
Deep learning concepts
MACHINE LEARNING YEAR DL SECOND PART.pptx
DeepLearningLecture.pptx
Ad

More from QuantUniversity (20)

PDF
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
PDF
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
PDF
EU Artificial Intelligence Act 2024 passed !
PDF
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
PDF
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PDF
Qu for India - QuantUniversity FundRaiser
PDF
Ml master class for CFA Dallas
PDF
Algorithmic auditing 1.0
PDF
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
PDF
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
PDF
Seeing what a gan cannot generate: paper review
PDF
AI Explainability and Model Risk Management
PDF
Algorithmic auditing 1.0
PDF
Machine Learning in Finance: 10 Things You Need to Know in 2021
PDF
Bayesian Portfolio Allocation
PDF
The API Jungle
PDF
Responsible AI in Action
PDF
Qu speaker series 14: Synthetic Data Generation in Finance
PDF
Qwafafew meeting 5
PDF
Qu speaker series:Ethical Use of AI in Financial Markets
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
EU Artificial Intelligence Act 2024 passed !
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
Qu for India - QuantUniversity FundRaiser
Ml master class for CFA Dallas
Algorithmic auditing 1.0
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Seeing what a gan cannot generate: paper review
AI Explainability and Model Risk Management
Algorithmic auditing 1.0
Machine Learning in Finance: 10 Things You Need to Know in 2021
Bayesian Portfolio Allocation
The API Jungle
Responsible AI in Action
Qu speaker series 14: Synthetic Data Generation in Finance
Qwafafew meeting 5
Qu speaker series:Ethical Use of AI in Financial Markets

Recently uploaded (20)

PDF
The Role of Pathology AI in Translational Cancer Research and Education
PPTX
MBA JAPAN: 2025 the University of Waseda
PPTX
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
PDF
Concepts of Database Management, 10th Edition by Lisa Friedrichsen Test Bank.pdf
PPTX
cp-and-safeguarding-training-2018-2019-mmfv2-230818062456-767bc1a7.pptx
PPTX
DATA MODELING, data model concepts, types of data concepts
PPT
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
PPT
Classification methods in data analytics.ppt
PDF
REPORT CARD OF GRADE 2 2025-2026 MATATAG
PPTX
Machine Learning and working of machine Learning
PDF
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
PDF
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PPTX
GPS sensor used agriculture land for automation
PDF
©️ 01_Algorithm for Microsoft New Product Launch - handling web site - by Ale...
PPTX
DATA ANALYTICS COURSE IN PITAMPURA.pptx
PPTX
1 hour to get there before the game is done so you don’t need a car seat for ...
PPTX
ifsm.pptx, institutional food service management
PPTX
PPT for Diseases (1)-2, types of diseases.pptx
PPTX
machinelearningoverview-250809184828-927201d2.pptx
The Role of Pathology AI in Translational Cancer Research and Education
MBA JAPAN: 2025 the University of Waseda
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
Concepts of Database Management, 10th Edition by Lisa Friedrichsen Test Bank.pdf
cp-and-safeguarding-training-2018-2019-mmfv2-230818062456-767bc1a7.pptx
DATA MODELING, data model concepts, types of data concepts
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
Classification methods in data analytics.ppt
REPORT CARD OF GRADE 2 2025-2026 MATATAG
Machine Learning and working of machine Learning
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
GPS sensor used agriculture land for automation
©️ 01_Algorithm for Microsoft New Product Launch - handling web site - by Ale...
DATA ANALYTICS COURSE IN PITAMPURA.pptx
1 hour to get there before the game is done so you don’t need a car seat for ...
ifsm.pptx, institutional food service management
PPT for Diseases (1)-2, types of diseases.pptx
machinelearningoverview-250809184828-927201d2.pptx

Optimization

  • 1. Topics in Optimization 2019 Copyright QuantUniversity LLC. Presented By: Anish Shah, CFA Optimization Lead QuantUniversity
  • 2. • Anish leads Optimization workshops at QuantUniversity. • Anish has worked as an investment quant at the alternative data hedge fund Cargometrics, Northfield, and ITG. • He holds an MS in Applied Math from Brown University and an MS in Operations Research from the University of California, Berkeley. Anish Shah Optimization Lead 2
  • 3. 3
  • 4. Once you work on data, optimization is as fundamental as linear algebra and appears everywhere, if not explicitly, then under the hood Inference typically happens by maximizing a likelihood or probability OLS regression (solved analytically) median regression GARCH fitting parameters to match the data, e.g. in a Kalman Filter Why should a quant care? Inference.
  • 5. Model something, add optimization, and create a “robot”, a tool that does the task automatically and better Google Maps models roads and finds short paths Expedia models flights and finds inexpensive trips Portfolio optimization models security returns’ movement and maximizes risk- adjusted return Algorithmic trading models how stock prices move in response to orders and executes trades at low cost and risk Opportunity to create and add tremendous value Why care? Build robots.
  • 6. Optimization puts the learning in machine learning ‘Training’ = minimizing a loss function In classification, e.g. cross-entropy = – log[p(correct label)] In numeric, e.g. squared error = (true value – prediction)2 To train a network to identify cats Using many photos labeled cat or not, optimize its weights to minimize avg cross-entropy of [network’s predicted probability, true label] Why care? Machine Learning.
  • 7. In real-world setups, no guarantee that the optimization is solvable However, particular well-studied classes have good algorithms and assurances of finding a best solution Knowing lets you restate or approximate yours as one Even if you can’t, knowing how things can fail lets you proceed skillfully Why care? Knowledge improves outcomes.
  • 8. x : variables that you can play with e.g. in portfolio optimization, one entry could be % invested in AMZN in finding the shortest route, it could be 1 if a road is used, 0 if not f(x) : ‘objective function’ - criterion you want to maximize or minimize e.g. portfolio’s forecasted risk-penalized return – transaction costs expected time from point A to B – a penalty on variability h(x) ≤ b : constraints e.g. no more than 20% of portfolio in tech stocks at every node (excluding start and finish), sum roads entering = sum exits General setup
  • 9. Goal: maximize expected return with a penalty on variance and costs x : vector of security weights to be found x0 : vector of initial security weights r : forecast return Σ : forecast covariance λ : risk aversion ɣ : cost aversion A, b : define linear constraints maxx rT x – λ xT Σ x – ɣ cost(x – x0) s.t. A x ≤ b Example: (vanilla) portfolio optimization
  • 10. Convex optimization - minimizing a bowl shaped function under constraints that, if met by any two points, are also met by all the points on the line between them Linear programming - linear objective, linear constraints Quadratic programming - quadratic objective, linear constraints SOCP - linear objective, quadratic constraints All other convex solved by general algorithms * Note: ‘programming’ just means optimization Structures that make life easy
  • 11. Linear constraints A x ≤ b Linear objective function in linear programming min cT x Quadratic objective function in quadratic programming min ½ xT Q x + cT x Linear and quadratic programming
  • 12. Assorted graph problems - shortest path, maximum flow, … Algorithm is specific to the problem Dynamic programming - a general principle used when the problem separates into contingent subproblems e.g. the shortest way from NY → LA can be broken into finding 1) the shortest way from NY → every bridge that crosses the Mississippi 2) the shortest way from every bridge that crosses the Mississippi → LA Encounter things of this flavor in reinforcement learning EM (expectation maximization) - solvable in alternating steps, very particular Structures that make life easy (2)
  • 13. 13 Non-convex objective functions – many extrema Non-convex constraints – path connecting acceptable points violates constraints Integer constraints or variables e.g. in portfolio optimization, trading securities in fixed sizes (round lots of 100 shares, whole futures contracts) Solved by a combination of heuristics and brute force Structures that make life hard
  • 14. Relaxation - sequentially approximate as an easier problem and solve Iterate - from wherever you are, move in a direction that meets constraints and improves the objective, perhaps leaving opportunity to jump around (Metropolis) to avoid stalling at a local extremum What do you outside the easy cases?
  • 15. Imagine fitting a machine learning model, with N observations of labeled data zi : data for ith observation yi : label for ith observation x : parameters (to be fit) of the model g(x, zi) : model’s output for observation i Given some loss function f, the goal is Minimizex ∑i=1..N loss(labeli, model outputi) = ∑i=1..N f(yi, g(x, zi) ) Following the gradient
  • 16. Want to minimizex ∑i=1..N f(yi, g(x, zi) ) Start with an initial value of x and step to improve the objective Why not the direction of the gradient? x ← x – a ∑i=1..N ∇f(yi, g(x, zi)) Called ‘steepest descent’ in the optimization literature Was regarded as a bad idea, slowed by differences in scale Better is inverse of curvature (2nd derivatives) x gradient Following the gradient (2)
  • 17. Want to minimizex ∑i=1..N f(yi, g(x, zi) ) refine x using one or a few sampled points rather than all the data x ← x – a ∇f(yk, g(x, zk)) k : the random sample a : the learning rate which is decayed over time Much lower computational cost Works as well, except for final refinement Steepest descent reemerges as stochastic gradient descent (SGD) in deep learning
  • 18. Stops at ‘local maximum’ if the function maximized is not a mountain peak but a mountain range Proceeds slowly and costly (AWS $) because of a poorly tuned or inappropriate algorithm Chases noise or overfits yielding poor out-of-sample results How can optimization go wrong?
  • 19. Suppose the goal is minx f(x, y) where y is noisy data Robust optimization - finds the best worst case given a range of inputs minx maxy f(x, y) where y, rather than being fixed, takes value in a range hard to solve except for special cases Regularization - penalizes the norm of the decision variable minx f(x, y) + λ |x| Ridge regression, lasso penalty, ... Tightly connected to Bayesian statistics Addressing noise and overfitting
  • 20. Portfolio optimization (Anish Shah 2015) Uncertain covariance and (patent-pending) uncertainty-penalized utility Maximize E[utility] – ɣ stdev[utility] Deep learning - ‘dropout’ At each training optimization step, randomly omit connections so information isn’t overfit and gets spread throughout the network More on regularization
  • 21. 21 • cvxopt – convex optimizer library • nlopt – nonlinear optimizer library • CBC – mixed integer programming library • Google OR Tools – lp and assorted graph algorithms • CPLEX – commercial optimizer for the hardest problems Optimization tools
  • 22. 22 • Portfolio optimization • Algorithmic trading • Index/ETF construction • Hedging • Asset/liability matching • Nowcasting covariance Applications in Finance
  • 23. Optimization is everywhere, especially as ML becomes pervasive This is the direction quant is headed Knowledge lets you create and add more value The optimization piece of ML doesn’t seem fully matured – there are puzzles to be solved! Summary