SlideShare a Scribd company logo
Learning to Trade with Q-Reinforcement Learning
(A tensorflow and Python focus)
Ben Ball & David Samuel
www.prediction-machines.com
Special thanks to -
Algorithmic Trading (e.g., HFT) vs Human Systematic Trading
Often looking at opportunities existing in the
microsecond time horizon. Typically using statistical
microstructure models and techniques from machine
learning. Automated but usually hand crafted
signals, exploits, and algorithms.
Predicting the next tick
Systematic strategies learned from experience of
“reading market behavior”, strategies operating over
tens of seconds or minutes.
Predicting complex plays
Take inspiration from Deep Mind – Learning to play Atari video games
+
Input
FC ReLU
FC ReLU
Functional
pass-though
Output
Could we do something similar for trading markets?
O
O
O
O
O
O
*Network images from http://www.asimovinstitute.org/neural-network-zoo/
Introduction to Reinforcement Learning
How does a child learn to ride a bike?
Lots of this
leading to
this
rather than this . . .
Machine Learning vs Reinforcement Learning
 No supervisor
 Trial and error paradigm
 Feedback delayed
 Time sequenced
 Agent influences the environment
Agent
Environment
Action atState St Reward rt & Next State St+1
Good textbook on this by
Sutton and Barto -
st, at, rt, st+1,at+1,rt+1,st+2,at+2,rt+2, …
REINFORCEjs
GridWorld :
---demo---
Value function
Policy function
Reward function
Application to Trading
Typical dynamics of a mean-reverting asset or pairs-trading where the spread exhibits mean reversion
Upper rangesoft boundary
Lower range soft boundary
Mean
Priceofmeanrevertingassetorspread
Map the movement of the mean
reverting asset (or spread) into a
discrete lattice where the price
dynamics become transitions
between lattice nodes.
We started with a simple 5 node
lattice but this can be increased
quite easily.
Asset / Spread price evolving with time
State transitions of lattice simulation of mean reversion:
Short LongFlat
Spreadpricemappedontolatticeindex
i = 0
i = -1
i = -2
i = 1
i = 2
sell buy
These map into:
(State, Action, Reward)
triplets used in the QRL algorithm
Mean Reversion Game Simulator
Level 3
Example buy transaction
Example sell transaction
http://www.prediction-machines.com/blog/ - for demonstration
As per the Atari games example, our QRL/DQN plays the trading game
… over-and-over
Building a DQN and defining its topology
Using Keras and Trading-Gym
+
Input
FC ReLU
FC ReLU
Functional
pass-though
Output
+
Input
FC ReLU
FC ReLU
Functional
pass-though
Output
Double Dueling DQN (vanilla DQN does not converge well but this method works much better)
target networktraining network
lattice position
(long,short,flat) state value of Buy
value of Sell
value of Do Nothing
Trading-Gym Architecture
Runner
warmup()
train()
run()
Children class
Agent
act()
observe()
end()
DQN
Double DQN
A3C
Abstract class
Memory
add()
sample()
Brain
train()
predict()
Data Generator
Random
Walks
Deterministic
Signals
CSV Replay
Market Data
Streamer
Single Asset
Multi Asset
Market
Making
Environment
render()
step()
reset()
next()
rewind()
Trading-Gym - OpenSourced
Prediction Machines release of Trading-Gym environment into OpenSource
- - demo - -
TensorFlow TradingBrain
released soon
TensorFlow TradingGym
available now
with Brain and DQN example
Prediction Machines release of Trading-Gym environment into OpenSource
References:
Insights In Reinforcement Learning (PhD thesis) by Hado van Hasselt
Human-level control though deep reinforcement learning
V Mnih, K Kavukcuoglu, D Silver, AA Rusu, J Veness, MG Bellemare, ...
Nature 518 (7540), 529-533
Deep Reinforcement Learning with Double Q-Learning
H Van Hasselt, A Guez, D Silver
AAAI, 2094-2100
Prioritized experience replay
T Schaul, J Quan, I Antonoglou, D Silver
arXiv preprint arXiv:1511.05952
Dueling Network Architectures for Deep Reinforcement Learning
Z Wang, T Schaul, M Hessel, H van Hasselt, M Lanctot, N de Freitas
The 33rd International Conference on Machine Learning, 1995–2003
DDDQN and
Tensorflow
Overview
1. DQN - DeepMind, Feb 2015 “DeepMindNature”
http://www.davidqiu.com:8888/research/nature14236.pdf
a. Experience Replay
b. Separate Target Network
2. DDQN - Double Q-learning. DeepMind, Dec 2015
https://arxiv.org/pdf/1509.06461.pdf
3. Prioritized Experience Replay - DeepMind, Feb 2016
https://arxiv.org/pdf/1511.05952.pdf
4. DDDQN - Dueling Double Q-learning. DeepMind, Apr 2016
https://arxiv.org/pdf/1511.06581.pdf
Enhancements
Experience Replay
Removes correlation in sequences
Smooths over changes in data distribution
Prioritized Experience Replay
Speeds up learning by choosing experiences with weighted distribution
Separate target network from Q network
Removes correlation with target - improves stability
Double Q learning
Removes a lot of the non uniform overestimations by separating selection of action and evaluation
Dueling Q learning
Improves learning with many similar action values. Separates Q value into two : state value and state-
dependent action advantage
Keras v Tensorflow
Keras Tensorflow
High level ✔
Standardized API ✔
Access to low level ✔
Tensorboard ✔* ✔
Understand under the hood ✔
Can use multiple backends ✔
Install Tensorflow
My installation was on CentOS in docker with GPU*, but also did locally on
Ubuntu 16 for this demo. *Built from source for maximum speed.
CentOS instructions were adapted from:
https://blog.abysm.org/2016/06/building-tensorflow-centos-6/
Ubuntu install was from:
https://www.tensorflow.org/install/install_sources
Tensorflow - what is it
A computational graph solver
Tensorflow key API
Namespaces for organizing the graph and showing in tensorboard
with tf.variable_scope('prediction'):
Sessions
with tf.Session() as sess:
Create variables and placeholders
var = tf.placeholder('int32', [None, 2, 3], name='varname’)
self.global_step = tf.Variable(0, trainable=False)
Session.run or variable.eval to run parts of the graph and retrieve values
pred_action = self.q_action.eval({self.s_t['p']: s_t_plus_1})
q_t, loss= self.sess.run([q['p'], loss], {target_q_t: target_q_t, action: action})
Trading-Gym
https://github.com/Prediction-Machines/Trading-Gym
Open sourced
Modelled after OpenAI Gym. Compatible with it.
Contains example of DQN with Keras
Contains pair trading example simulator and visualizer
Trading-Brain
https://github.com/Prediction-Machines/Trading-Brain
Two rich examples
Contains the Trading-Gym Keras example with suggested structuring
examples/keras_example.py
Contains example of Dueling Double DQN for single stock trading game
examples/tf_example.py
References
Much of the Brain and config code in this example is adapted from devsisters github:
https://github.com/devsisters/DQN-tensorflow
Our github:
https://github.com/Prediction-Machines
Our blog:
http://prediction-machines.com/blog/
Our job openings:
http://prediction-machines.com/jobopenings/
Video of this presentation:
https://www.youtube.com/watch?v=xvm-M-R2fZY
Deep Learning in Python with Tensorflow for Finance

More Related Content

Similar to Deep Learning in Python with Tensorflow for Finance (20)

PPTX
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Akihiro Hayashi
 
PDF
Adtech scala-performance-tuning-150323223738-conversion-gate01
Giridhar Addepalli
 
PDF
Adtech x Scala x Performance tuning
Yosuke Mizutani
 
PPTX
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
gdgsurrey
 
PDF
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
PDF
Deep learning - the conf br 2018
Fabio Janiszevski
 
PDF
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Chris Fregly
 
PDF
Intelligent Monitoring
Intelie
 
PPTX
The Road To Reactive with RxJava JEEConf 2016
Frank Lyaruu
 
PDF
Google Cloud Platform Empowers TensorFlow and Machine Learning
DataWorks Summit/Hadoop Summit
 
PPTX
Reinforcement Learning for Self Driving Cars
Sneha Ravikumar
 
PPTX
Bring your neural networks to the browser with TF.js - Simone Scardapane
MeetupDataScienceRoma
 
PPTX
Optimizing apps for better performance extended
Elif Boncuk
 
PDF
Mantis: Netflix's Event Stream Processing System
C4Media
 
PPTX
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Tyrone Systems
 
PDF
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
AI Frontiers
 
PPTX
Training course lect1
Noor Dhiya
 
ODP
How Many Ohs? (An Integration Guide to Apex & Triple-o)
OPNFV
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Akihiro Hayashi
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Giridhar Addepalli
 
Adtech x Scala x Performance tuning
Yosuke Mizutani
 
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
gdgsurrey
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Deep learning - the conf br 2018
Fabio Janiszevski
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Chris Fregly
 
Intelligent Monitoring
Intelie
 
The Road To Reactive with RxJava JEEConf 2016
Frank Lyaruu
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
DataWorks Summit/Hadoop Summit
 
Reinforcement Learning for Self Driving Cars
Sneha Ravikumar
 
Bring your neural networks to the browser with TF.js - Simone Scardapane
MeetupDataScienceRoma
 
Optimizing apps for better performance extended
Elif Boncuk
 
Mantis: Netflix's Event Stream Processing System
C4Media
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Tyrone Systems
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
AI Frontiers
 
Training course lect1
Noor Dhiya
 
How Many Ohs? (An Integration Guide to Apex & Triple-o)
OPNFV
 

Recently uploaded (20)

PDF
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPTX
Thermal runway and thermal stability.pptx
godow93766
 
PDF
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
PPTX
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PDF
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PPTX
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PPTX
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
PPT
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PPTX
Green Building & Energy Conservation ppt
Sagar Sarangi
 
PPTX
Evaluation and thermal analysis of shell and tube heat exchanger as per requi...
shahveer210504
 
PPTX
GitOps_Without_K8s_Training simple one without k8s
DanialHabibi2
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PPTX
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
Thermal runway and thermal stability.pptx
godow93766
 
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
MRRS Strength and Durability of Concrete
CivilMythili
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
Green Building & Energy Conservation ppt
Sagar Sarangi
 
Evaluation and thermal analysis of shell and tube heat exchanger as per requi...
shahveer210504
 
GitOps_Without_K8s_Training simple one without k8s
DanialHabibi2
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
Ad

Deep Learning in Python with Tensorflow for Finance

  • 1. Learning to Trade with Q-Reinforcement Learning (A tensorflow and Python focus) Ben Ball & David Samuel www.prediction-machines.com
  • 3. Algorithmic Trading (e.g., HFT) vs Human Systematic Trading Often looking at opportunities existing in the microsecond time horizon. Typically using statistical microstructure models and techniques from machine learning. Automated but usually hand crafted signals, exploits, and algorithms. Predicting the next tick Systematic strategies learned from experience of “reading market behavior”, strategies operating over tens of seconds or minutes. Predicting complex plays
  • 4. Take inspiration from Deep Mind – Learning to play Atari video games
  • 5. + Input FC ReLU FC ReLU Functional pass-though Output Could we do something similar for trading markets? O O O O O O *Network images from http://www.asimovinstitute.org/neural-network-zoo/
  • 7. How does a child learn to ride a bike? Lots of this leading to this rather than this . . .
  • 8. Machine Learning vs Reinforcement Learning  No supervisor  Trial and error paradigm  Feedback delayed  Time sequenced  Agent influences the environment Agent Environment Action atState St Reward rt & Next State St+1 Good textbook on this by Sutton and Barto - st, at, rt, st+1,at+1,rt+1,st+2,at+2,rt+2, …
  • 11. Typical dynamics of a mean-reverting asset or pairs-trading where the spread exhibits mean reversion Upper rangesoft boundary Lower range soft boundary Mean Priceofmeanrevertingassetorspread Map the movement of the mean reverting asset (or spread) into a discrete lattice where the price dynamics become transitions between lattice nodes. We started with a simple 5 node lattice but this can be increased quite easily. Asset / Spread price evolving with time
  • 12. State transitions of lattice simulation of mean reversion: Short LongFlat Spreadpricemappedontolatticeindex i = 0 i = -1 i = -2 i = 1 i = 2 sell buy These map into: (State, Action, Reward) triplets used in the QRL algorithm
  • 13. Mean Reversion Game Simulator Level 3 Example buy transaction Example sell transaction
  • 14. http://www.prediction-machines.com/blog/ - for demonstration As per the Atari games example, our QRL/DQN plays the trading game … over-and-over
  • 15. Building a DQN and defining its topology Using Keras and Trading-Gym
  • 16. + Input FC ReLU FC ReLU Functional pass-though Output + Input FC ReLU FC ReLU Functional pass-though Output Double Dueling DQN (vanilla DQN does not converge well but this method works much better) target networktraining network lattice position (long,short,flat) state value of Buy value of Sell value of Do Nothing
  • 17. Trading-Gym Architecture Runner warmup() train() run() Children class Agent act() observe() end() DQN Double DQN A3C Abstract class Memory add() sample() Brain train() predict() Data Generator Random Walks Deterministic Signals CSV Replay Market Data Streamer Single Asset Multi Asset Market Making Environment render() step() reset() next() rewind() Trading-Gym - OpenSourced
  • 18. Prediction Machines release of Trading-Gym environment into OpenSource - - demo - -
  • 19. TensorFlow TradingBrain released soon TensorFlow TradingGym available now with Brain and DQN example Prediction Machines release of Trading-Gym environment into OpenSource
  • 20. References: Insights In Reinforcement Learning (PhD thesis) by Hado van Hasselt Human-level control though deep reinforcement learning V Mnih, K Kavukcuoglu, D Silver, AA Rusu, J Veness, MG Bellemare, ... Nature 518 (7540), 529-533 Deep Reinforcement Learning with Double Q-Learning H Van Hasselt, A Guez, D Silver AAAI, 2094-2100 Prioritized experience replay T Schaul, J Quan, I Antonoglou, D Silver arXiv preprint arXiv:1511.05952 Dueling Network Architectures for Deep Reinforcement Learning Z Wang, T Schaul, M Hessel, H van Hasselt, M Lanctot, N de Freitas The 33rd International Conference on Machine Learning, 1995–2003
  • 22. Overview 1. DQN - DeepMind, Feb 2015 “DeepMindNature” http://www.davidqiu.com:8888/research/nature14236.pdf a. Experience Replay b. Separate Target Network 2. DDQN - Double Q-learning. DeepMind, Dec 2015 https://arxiv.org/pdf/1509.06461.pdf 3. Prioritized Experience Replay - DeepMind, Feb 2016 https://arxiv.org/pdf/1511.05952.pdf 4. DDDQN - Dueling Double Q-learning. DeepMind, Apr 2016 https://arxiv.org/pdf/1511.06581.pdf
  • 23. Enhancements Experience Replay Removes correlation in sequences Smooths over changes in data distribution Prioritized Experience Replay Speeds up learning by choosing experiences with weighted distribution Separate target network from Q network Removes correlation with target - improves stability Double Q learning Removes a lot of the non uniform overestimations by separating selection of action and evaluation Dueling Q learning Improves learning with many similar action values. Separates Q value into two : state value and state- dependent action advantage
  • 24. Keras v Tensorflow Keras Tensorflow High level ✔ Standardized API ✔ Access to low level ✔ Tensorboard ✔* ✔ Understand under the hood ✔ Can use multiple backends ✔
  • 25. Install Tensorflow My installation was on CentOS in docker with GPU*, but also did locally on Ubuntu 16 for this demo. *Built from source for maximum speed. CentOS instructions were adapted from: https://blog.abysm.org/2016/06/building-tensorflow-centos-6/ Ubuntu install was from: https://www.tensorflow.org/install/install_sources
  • 26. Tensorflow - what is it A computational graph solver
  • 27. Tensorflow key API Namespaces for organizing the graph and showing in tensorboard with tf.variable_scope('prediction'): Sessions with tf.Session() as sess: Create variables and placeholders var = tf.placeholder('int32', [None, 2, 3], name='varname’) self.global_step = tf.Variable(0, trainable=False) Session.run or variable.eval to run parts of the graph and retrieve values pred_action = self.q_action.eval({self.s_t['p']: s_t_plus_1}) q_t, loss= self.sess.run([q['p'], loss], {target_q_t: target_q_t, action: action})
  • 28. Trading-Gym https://github.com/Prediction-Machines/Trading-Gym Open sourced Modelled after OpenAI Gym. Compatible with it. Contains example of DQN with Keras Contains pair trading example simulator and visualizer
  • 29. Trading-Brain https://github.com/Prediction-Machines/Trading-Brain Two rich examples Contains the Trading-Gym Keras example with suggested structuring examples/keras_example.py Contains example of Dueling Double DQN for single stock trading game examples/tf_example.py
  • 30. References Much of the Brain and config code in this example is adapted from devsisters github: https://github.com/devsisters/DQN-tensorflow Our github: https://github.com/Prediction-Machines Our blog: http://prediction-machines.com/blog/ Our job openings: http://prediction-machines.com/jobopenings/ Video of this presentation: https://www.youtube.com/watch?v=xvm-M-R2fZY